Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise an error if spec has unknown columns #112

Merged
merged 2 commits into from
Jul 7, 2023
Merged

Conversation

qubixes
Copy link
Member

@qubixes qubixes commented Jul 4, 2023

Fixes #104

@@ -117,6 +117,9 @@ def from_dataframe(cls,
else:
spec = deepcopy(spec)

if set(list(spec)) - set(df.columns):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks great, set difference by just instantiating set objects

Comment on lines 121 to 122
raise ValueError("Specifications found for column that were not found in the "
f"dataset itself: {set(list(spec)) - set(df.columns)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Samuwhale do you have a suggestion for a better error message here? I am wondering if this is immediately understandable, but I'm finding it hard to decide on a better error message.

I checked what tidyverse does when loading a csv with a spec with a column that could not be found; it displays a warning (not an error) and it's the following message:

Warning message:                                                                     
The following named parsers don't match the column names: Pid 

I think that is kind of worse so maybe our message is ok 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge and create a new issue out of this comment

@@ -92,6 +92,10 @@ def check_dataset(dataset):
print(name, dataset.descriptions[name])
assert dataset.descriptions[name] == name

# Check whether non-columns raise an error
with pytest.raises(ValueError):
dataset = MetaDataset.from_dataframe(df, spec={"unicorn": {"prop_missing": 0.5}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦄

@qubixes qubixes merged commit 04c45b0 into main Jul 7, 2023
@qubixes qubixes deleted the fix-unknown-spec branch July 17, 2023 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unknown keys in MetaDataset specification are silently ignored
2 participants