Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a
require_data()
method #715Add a
require_data()
method #715Changes from all commits
319c67c
5ea6509
4f6dea7
f030d57
7e8936e
d5465dd
ff97338
c1e01b9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In keeping with the theme of my comment of returning all identifiers of the missing values I don't think we should drop the relevant columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make the validation later
len(df) != n
impossible, because there may be multiple entries along a not-required dimension.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is purely a clarifying question on my side, why is the
if len(df) !=n
needed? As I understand itn
is the total number of combinations of required data fields, right? So for example two variables for two regions for three years would be 2x2x3=12?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the number of required data (
n
, computed as the product of the number of values per dimension) does not equal to the number of existing datalen(df)
, then some of the required data is missing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that assumption not fall flat as soon as you provide multiple values for
unit
? Maybe we should restrict the unit attribute to a single one then.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is why I remove the columns that are not required and then drop duplicates...
(And this issue is not restricted the units, you have the same problem when you only require variables and you have multiple regions).