Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to define missing values in a single field? #551

Closed
vitorbaptista opened this issue Dec 12, 2017 · 12 comments
Closed

How to define missing values in a single field? #551

vitorbaptista opened this issue Dec 12, 2017 · 12 comments

Comments

@vitorbaptista
Copy link
Contributor

There are cases where a missing value in a column is a valid value in another column. For example, https://data.gov.au/dataset/colac-otway-shire-trees/resource/bcf1d62b-9e72-4eca-b183-418f83dedcea has a missing value for Year_Planted as 0, but that might be valid in another columns.

From https://frictionlessdata.io/specs/table-schema/, it appears missing values can only be defined on the table level, so this case can't be defined in table schema.

@rufuspollock
Copy link
Contributor

@vitorbaptista originally they were per field but we switched to per resource. I think it could make sense to support both and not complex to do so but i'd appreciate thoughts form @pwalsh

@roll
Copy link
Member

roll commented Dec 13, 2017

Yes it's easy on the implementation level

@vitorbaptista
Copy link
Contributor Author

I'm usually not very fond of allowing multiple ways to do the same thing (i.e. supporting both per field and per resource), as then to understand the schema you need to keep in your head not only the current definition, but all other places where this definition was overloaded.

Because of this, I prefer to have these changes on a single place, even though this will create repetition in some cases. In other words, I think being explicit is in general more important than reducing repetition.

However, the pattern of having a global default and overloading it in the specific cases is well understood, so I'm OK either way.

@rufuspollock
Copy link
Contributor

@vitorbaptista we already have a global default i think so i think the discussion then becomes per resource or per field.

It would be useful to locate the issue / PR where we switched to see the original reasoning (if any).

@Stephen-Gates
Copy link
Contributor

Stephen-Gates commented Dec 26, 2017

@rufuspollock Perhaps these are related:

missingValue is better defined per field rather than per resource and the logic of both per field and per resource is quite complex.

The missingValue property requires much more work than this old implementation, as it is per field, and for common use cases this means a lot of duplicate information

@rufuspollock
Copy link
Contributor

@Stephen-Gates 👏 and it's #359 where the discussion of moving to per resource from per field happened.

@vitorbaptista i think the question here is whether this is worth it. It seems there are some real pros to having defaults at a per resource level. At the same time having it in both places (resource and field) is a pain i think (?) both for users and implementors.

Hmmm ...

@pwalsh
Copy link
Member

pwalsh commented Jan 2, 2018

I think this is a specific case of a pattern where some properties should cascade.

@vitorbaptista
Copy link
Contributor Author

I agree on having it per resource and per field with cascading 👍

@rufuspollock
Copy link
Contributor

@vitorbaptista ok, assigning to milestone v1.1.

@Stephen-Gates
Copy link
Contributor

Stephen-Gates commented Apr 7, 2018

I'm thinking of implementing this in Data Curator. Suggested wording for a Pattern is on the forum https://discuss.okfn.org/t/missing-values-per-field-pattern/6571

edit: and added #608

@ashepherd
Copy link

Just wanted to add our use case for the missingValues property on a field. @BCODMO provides data management support for individual researchers on short-term funding (1-3 yrs) from NSF. Our researcher's data can sometimes contain short-hand values that can be joined with lookup tables after they come back from sea. These fields are usually things like species names or quality flags that they abbreviate to save time during collection. There could be a scenario where a missing data value of 'nd' in one column might be a valid data value in another column. Having missingValues with a string of nd at the table-level can lead to NULL values in a column where nd might be a valid value for an abbreviated species name or quality flag.

@roll
Copy link
Member

roll commented Jan 3, 2024

It was defined as a pattern - https://specs.frictionlessdata.io/patterns/#missing-values-per-field

And now it's a candidate for entering the spec. CLOSING in favor of #861

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

6 participants