Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interpolate over nans of pld sensors #140

Merged
merged 9 commits into from
Feb 23, 2023

Conversation

callumrollo
Copy link
Collaborator

This will fix Issue #128

This is a first draft. If interpolate is set to True in raw_to_timeseries, nans from variables will be interpolated over as the sensors are aligned with the timestamps of the variable designated as the timebase.

This should be tested with some delayed mode data, as the interpolation has no effect on nrt data, as these typically have no nans. @hvdosser @jklymak, do you have a delayed mode dataset you would like to use to test this method?

@callumrollo
Copy link
Collaborator Author

We can make the interpolate kwarg True by default, or control this behavior from the yaml

@codecov
Copy link

codecov bot commented Dec 15, 2022

Codecov Report

Base: 78.48% // Head: 79.47% // Increases project coverage by +0.98% 🎉

Coverage data is based on head (6205c43) compared to base (827f7f7).
Patch coverage: 94.59% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #140      +/-   ##
==========================================
+ Coverage   78.48%   79.47%   +0.98%     
==========================================
  Files           7        7              
  Lines        1292     1364      +72     
==========================================
+ Hits         1014     1084      +70     
- Misses        278      280       +2     
Impacted Files Coverage Δ
pyglider/seaexplorer.py 81.91% <75.00%> (+0.19%) ⬆️
tests/test_pyglider.py 98.17% <100.00%> (+1.00%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hvdosser
Copy link
Collaborator

hvdosser commented Dec 16, 2022

Hi @callumrollo, I'd recommend testing on this delayed mode dataset: https://cproof.uvic.ca/gliderdata/deployments/dfo-bb046/dfo-bb046-20200908/delayed_raw/
I would suggest controlling the behaviour from the yaml, so users really know what they are getting out of the processing, but I defer to @jklymak on this one.

@callumrollo
Copy link
Collaborator Author

Thanks @hvdosser. Can you send me a yaml file for the dataset you linked? I will then incorporate the nan interpolation with control from the yaml file.

Sorry for the delay on this, just back from vacation

@hvdosser
Copy link
Collaborator

Hi @callumrollo, no worries, thanks for figuring this out! Here's the link to the yaml: https://cproof.uvic.ca/gliderdata/deployments/dfo-bb046/dfo-bb046-20200908/deployment.yml

@callumrollo
Copy link
Collaborator Author

@hvdosser do these tests look sufficient? I've added some of the data you provided and tested it in default mode and with nan interpolation enabled. We're using GPCTD as the timebase.The nan interpolation adds non-nan values for the other sensors (oxygen and cdom) where they did not sample at the same time as the CTD.

Getting a nice increase isn test coverage from this too :)

@jklymak
Copy link
Member

jklymak commented Jan 16, 2023

Ping @hvdosser for a review on this

@jklymak
Copy link
Member

jklymak commented Jan 26, 2023

@hvdosser if you had a chance to make sure this works it would be appreciated...

@jklymak jklymak added this to the 0.0.5 milestone Jan 26, 2023
@callumrollo
Copy link
Collaborator Author

@hvdosser do you have a timeframe for reviewing this? It's quite a substantial change, so I'd like to get it merged before addressing other Issues

@hvdosser
Copy link
Collaborator

Hi @callumrollo and @jklymak , sorry for the delay on this. I'll have time on Monday Feb 13th to review and merge.

Copy link
Collaborator

@hvdosser hvdosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, and the new tests are great! I added a few very minor comments. As a broader comment for @jklymak, we'll likely want to update our example yaml files and documentation with some information about including interpolate for the SeaExplorers, since this is a pretty major correction to the delayed-mode processing. Thanks for your efforts on this @callumrollo, and apologies again for the delay in review.

tests/test_pyglider.py Outdated Show resolved Hide resolved
tests/test_pyglider.py Show resolved Hide resolved
pyglider/seaexplorer.py Show resolved Hide resolved
@callumrollo
Copy link
Collaborator Author

I've merged in the latest changes from main. Is this good to go @hvdosser?

@hvdosser
Copy link
Collaborator

@callumrollo and @jklymak, I'll merge this pull request, but I do think we need to consider what happens when we have large gaps in the data and we interpolate over them. Particularly for sensors like the ECOpuck, which only samples a small portion of the water column. Am I correct in thinking that setting interpolate=True will result in new interpolated points everywhere the CTD has data? If that's the case, we'll need to figure something out to deal with these larger gaps before we can move forward.

@hvdosser hvdosser merged commit 676712c into c-proof:main Feb 23, 2023
@jklymak
Copy link
Member

jklymak commented Feb 23, 2023

OK, I think xarray provides this functionality:

https://docs.xarray.dev/en/stable/generated/xarray.DataArray.interpolate_na.html#xarray-dataarray-interpolate-na

note the max_gap parameter, which should be perfect for this. Just set to some reasonable value (that probably changes depending on the sampling frequency) and big gaps would remain NaN...

@callumrollo
Copy link
Collaborator Author

Hi @hvdosser. The potential problem with slow sampling sensors was raised in the original discussion of Issue #128. Linear interpolation will result in an apparent measurement at every timestamp of the specified timebase. using keep_vars will ensure these measurements are retained. An alternative would be to implement a max_gap as Jody suggests, or to have the option of nearest neighbout interpolation of some kind. Either way, I feel this is best addressed in a separate PR, preferably with the use of a dataset which demonstrates the issue you want to solve as test data

@callumrollo callumrollo deleted the callum-patch-40 branch November 29, 2024 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants