-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCHEMA] Add patterns for format validation #885
Conversation
src/schema/formats.yaml
Outdated
dataset_relative: | ||
description: | | ||
A path to a file, relative to the dataset folder. | ||
|
||
The validation for this format is minimal. | ||
pattern: ^[0-9a-zA-Z/_-\.]+$ | ||
participant_relative: | ||
description: | | ||
A path to a file, relative to the participant's folder in the dataset. | ||
|
||
The validation for this format is minimal. | ||
pattern: ^[0-9a-zA-Z/_-\.]+$ | ||
uri: | ||
description: | | ||
A uniform resource indicator. | ||
pattern: ^[a-zA-Z]+:[0-9a-zA-Z/_-\.]+$ | ||
bids_uri: | ||
description: | | ||
A BIDS uniform resource indicator. | ||
pattern: ^bids:[0-9a-zA-Z/_-\.]+$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patterns for all four of the path/URI formats should be revised. I don't know enough about the rules to define solid patterns just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same goes for stimuli_relative
(added after the initial comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will for sure be superuseful. I don't speak regex fluently so not sure I am the best to review this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structure makes sense to me. Eyeballing the regex it seems ok, but the best way to verify it is for it to be used by the validator in tests.
src/schema/formats.yaml
Outdated
description: | | ||
The basic string type (not a specific format). | ||
This should allow any free-form string *except* "n/a". | ||
pattern: ^(?!(n/a)$).*$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to #876
This should allow any free-form string except "n/a".
I am not sure whether disallowing n/a
as a string for free-form text fields would break some datasets. For example for the required EEGReference
field, users could currently pass n/a
according to bids-validator:
Whether or not that's a good idea should be discussed in #876, this is just a note that what you introduce here has not been discussed and has not been formally part of the spec 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is that any cases where "n/a" is acceptable will have "n/a" as an explicit option within the schema. I've tried to add it as an option in all of the cases that I've seen like that, but that doesn't mean we're close to having complete coverage. I can drop any fancy "n/a" handling from this PR if you'd rather wait to tackle it within #876 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dropped the special "n/a" handling in dfd3057, but I do think we will want to distinguish between "n/a"s and freeform strings in the long run.
I was wondering on why |
Currently, we have
Whereas common "patterns" are consolidated as "formats." I guess you could say that this file defines the "patterns" associated with the "formats", but I feel like naming it |
I guess the next step for this pr is adding some tests as @tsalo was planning todo (and #833 is merged) or that would be postponed till later? |
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
…ation into format-rules
Codecov Report
@@ Coverage Diff @@
## master #885 +/- ##
==========================================
+ Coverage 34.05% 36.35% +2.30%
==========================================
Files 8 8
Lines 834 850 +16
==========================================
+ Hits 284 309 +25
+ Misses 550 541 -9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just love testing boundary conditions I guess ;)
Thanks for the suggestions @yarikoptic. They're great! Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
It looks like EDIT: URIs can't really be validated well with regular expressions, apparently, so I think the best solution would be to do only light testing for the regular expression, but mention in the description that folks should use an appropriate function in practice. |
I can try later to create one following the "Syntax" of https://en.wikipedia.org/wiki/Uniform_Resource_Identifier which would demand to have a non-degenerate edit: although not sure if it would come up stringent enough to identify errors. There is also a regex given on https://www.ietf.org/rfc/rfc3986.txt as
and making
didn't analyze further. Feel free to just add a note on possible need for improvement or other way to specify extra rules. |
Now that I have two approvals, I'm going to merge. I think we can keep improving these patterns (i.e., #999) before the next release. |
Closes #832.
This currently doesn't include any code that would use the patterns, except for tests.
Changes proposed:
formats.yaml
file to schema, with regular expressions with which to validate different entity values, metadata types, and metadata string formats. Individual metadata fields should still be allowed to supply a specific "pattern" keyword.To do:
SoftwareRRID
)