Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deduplicate timestamps with check #96

Merged
merged 3 commits into from
Jun 22, 2022

Conversation

callumrollo
Copy link
Collaborator

This resolves #95

Varying response depending on how many samples are removed by this deduplication:
more than half lost - error
more than 0.1 % lost - warning
otherwise log

order by time and drop_duplicates moved before oxygen_concentration_correction so the reindex in this function does not breeak.

@callumrollo
Copy link
Collaborator Author

This is failing the tests as xarray compatible with Python 3.7 doesn't have drop_duplicates

@callumrollo
Copy link
Collaborator Author

It appears to work on Python 3.8 and more recent versions. We could skip the call to drop_duplicates if the function does not exist and raise a warning that it's been skipped?

@jklymak
Copy link
Member

jklymak commented Jun 20, 2022

I think we could pin to 3.8. That's almost 3 years old.

@callumrollo
Copy link
Collaborator Author

I added a check for drop_duplicates that works around the problem. Or would you prefer to drop 3.8 now?

raise ValueError(f"{loss_str} Check input data for duplicate timestamps")
elif proportion_kept < 0.999:
_log.warning(loss_str)
if "drop_duplicates" in ds.__dir__():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think:

Suggested change
if "drop_duplicates" in ds.__dir__():
if hasattr(ds, "drop_duplicates"):

else:
_log.info(loss_str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just write our own drop_duplicates? Its basically good = (ds.time.diff() > 1e-6) ds = ds.isel(time=good), plus or minus one on the index, and maybe keeping the first index.

@jklymak
Copy link
Member

jklymak commented Jun 22, 2022

Feel free to self-merge if you are done....

@callumrollo callumrollo merged commit f08156b into c-proof:main Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate time values in pld1 break oxygen correction
2 participants