Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings about bad input data #152

Open
awst-baum opened this issue Sep 17, 2018 · 5 comments
Open

Warnings about bad input data #152

awst-baum opened this issue Sep 17, 2018 · 5 comments
Labels

Comments

@awst-baum
Copy link
Collaborator

awst-baum commented Sep 17, 2018

As far as I can see: if there's "bad" input data, pytesmo usually either drops it or issues a (sometimes quite generic) warning.
Examples are:

  • pytesmo.validation_framework.data_manager.DataManager.read_ds: warnings are given but exception and sometimes dataset name and arguments information are omitted.
  • pytesmo.temporal_matching.df_match, lines 90-117: If there are no matches between data and reference, no warning is given and an empty (or filled with NaN) DataFrame is returned.

Is there generic philosophy behind this like "don't bother the user at all, just give them the results we can produce and let them look into missing or faulty data themselves"?

Since we're currently trying to build a user-friendly webservice that uses pytesmo for validations, we'd like to tell the user not only "x% of your input data didn't yield results" but also ideally why that was the case. But that may clash with the more Python-developer-oriented approach pytesmo has?
Would you be open to us adding more warnings? How much would be too much?

@cpaulik
Copy link
Collaborator

cpaulik commented Sep 17, 2018 via email

@awst-baum
Copy link
Collaborator Author

I could also imagine a strict mode or something like that which raises an exception for these failures.

Might be done with https://docs.python.org/3/library/warnings.html#the-warnings-filter ?

Re results object: I hadn't thought that far. It sounds promising/interesting but may be a major change, right? A tricky part may be storing the results into a netcdf file when they contain error reports as well as results arrays.
For the webservice, we're looking at both short-term and long-term solutions.

PS: I'm currently playing around in a branch here but haven't done too much yet: https://github.com/awst-austria/pytesmo/tree/verbose_warnings
I need to define some unit tests...

@cpaulik
Copy link
Collaborator

cpaulik commented Sep 18, 2018

Might be done with https://docs.python.org/3/library/warnings.html#the-warnings-filter ?

Yes that should work fine.

Re results object: I hadn't thought that far. It sounds promising/interesting but may be a major change, right?

Using a results object instead of the dictionary we currently should not be too big of a change. But I could be wrong.

A tricky part may be storing the results into a netcdf file when they contain error reports as well as results arrays.

We would have to come up with a flagging system where each error has a value. This should then be fairly easy to store according to CF conventions. See http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#flags

@awst-baum
Copy link
Collaborator Author

And the results object would be put together in pytesmo.validation_framework.validation.Validation.perform_validation?

Of course the trick for creating a netcdf output format would be to foresee the problems that occur and categorise them in a useful fashion (NOT so that all practically occurring issues ends up in "other errors"). And then to write a reader/writer for it, I guess?

@cpaulik
Copy link
Collaborator

cpaulik commented Sep 21, 2018

And the results object would be put together in pytesmo.validation_framework.validation.Validation.perform_validation?

Yes.

Of course the trick for creating a netcdf output format would be to foresee the problems that occur and categorise them in a useful fashion (NOT so that all practically occurring issues ends up in "other errors"). And then to write a reader/writer for it, I guess?

For every exception that we have we can add an error code/value/bit that we then set in the result. The ResultsManager will have to be updated.

s-scherrer added a commit to s-scherrer/pytesmo that referenced this issue Feb 12, 2021
s-scherrer added a commit to s-scherrer/pytesmo that referenced this issue Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants