-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitch predict_xr #1270
Mitch predict_xr #1270
Conversation
Hey @mitchest, this looks great to me - do you have a screenshot of the outputs of |
Yer - no worries Here's a SS from just the outputs (plot 1/3 is Here's a SS actually using the coastal binary layer to mask another product |
* Fix deprecation warnings and speed up code * Change last modified date * Intertidal exposure (#1261) * New files for intertidal exposure notebook * minor editing to Load packages cells * Updated discord and removed slack links * Improved markdown linking to image and gif in Introduction * Incorporated SS and MA reviews * Updated to include RBT reviews * Minor pip install and notebook naming edits. Add notebook to README * adding renamed exposure notebook back into PR * adding global SAR access through microsoft planetary compute (#1263) * adding global SAR access through microsoft planetary compute * Make minor spelling and formatting amendments. * small changes for PR --------- Co-authored-by: geoscience-aman <96451725+geoscience-aman@users.noreply.github.com> * Update USAGE.rst (#1268) Add Swinburne course for 2024 * Minor compatibility change for tide modelling package (#1269) * Mitch predict_xr (#1270) * add probability array output to predict_xr * predict_xr at proba_max args * predict_xr match arg names * xr_predict deal with multiband prob outout * xr_predict merge output probs * clean up comments and spacing * Update USAGE.rst (#1272) Add new reference, Burton et al 2024 Enhancing long-term vegetation monitoring in Australia: a new approach for harmonising the Advanced Very High Resolution Radiometer normalised-difference vegetation (NVDI) with MODIS NDVI * Fix broken code on `unstable` Sandbox image (#1274) * Updates for pyTMD * Fix contours bug due to groupby squeeze * Try loosening pyTMD requirements * Update tests to pass on both stable and unstable sandbox * Fix pansharpening bug --------- Co-authored-by: Aman Chopra <aman.chopra@ga.gov.au> Co-authored-by: geoscience-aman <96451725+geoscience-aman@users.noreply.github.com> Co-authored-by: ClaireP <claire.phillips@ga.gov.au> Co-authored-by: Alex Bradley <55119000+abradley60@users.noreply.github.com> Co-authored-by: Bex Dunn <BexDunn@users.noreply.github.com> Co-authored-by: Mitchell Lyons <mitchell.lyons@gmail.com>
I noticed reduced performance after this was merged. After reviewing the code it looks like these two lines no longer have an effect, and that dea-notebooks/Tools/dea_tools/classification.py Lines 365 to 366 in a6e937f
|
@jessjaco By performance, do you mean speed is slower, or that the results are lower quality? Any extra info (e.g. screenshots or timings) would be awesome - we'll have a look! Are you running your code with |
I was noticing performance issues, and I'm not positive why. But the results also appear to be incorrect. Please see below. from dea_tools.classification import predict_xr
import numpy as np
import odc.geo.xr
from sklearn.tree import DecisionTreeClassifier
import xarray as xr
# Dummy data & model
X = np.random.rand(1000, 1)
B = 3
e = np.random.rand(1000, 1)
y = ((X * B + e) // 1).astype(int) + 1
model = DecisionTreeClassifier()
model.fit(X, y)
size_1d = 1_000
X_new = np.random.rand(size_1d, size_1d)
X_new_ds = (
xr.DataArray(
X_new,
dims=("x", "y"),
coords={"x": np.arange(size_1d), "y": np.arange(size_1d)},
)
.odc.assign_crs(26912)
.to_dataset(name="data")
)
prediction = predict_xr(model, X_new_ds, chunk_size=10, proba=True)
print((prediction.Probabilities == 0).any())
# False
X_new_ds_masked = X_new_ds.where(X_new_ds < 0.9)
masked_prediction = predict_xr(
model, X_new_ds_masked, chunk_size=10, proba=True, clean=True
)
print(X_new_ds_masked.data.isnull().any())
# True and should be
print((masked_prediction.Probabilities == 0).any())
# False, but should be true, since the nans should be converted to zero |
Proposed changes
Updating the
predict_xr()
function to allow returning the full array of prediciton probabilities our ofpredict_proba()
, which is actually the default behaviour normallyCloses issues (optional)
Checklist
(Replace
[ ]
with[x]
to check off)Load packages
General advice
)jupyterlab_code_formatter
tool can be used to format code cells to a consistent style: select each code cell, then clickEdit
and then one of theApply X Formatter
options (YAPF
orBlack
are recommended).NCI
andDEA Sandbox
(flag if not working as part of PR and ask for help to solve if needed)Notebook currently compatible with the NCI|DEA Sandbox environment only
line below the notebook title to reflect the environments the notebook is compatible with