Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Rewrite inheritance principle #946

Merged

Conversation

Lestropie
Copy link
Collaborator

Draft PR in search of feedback.

This change relates to #102, #259, #482; am escalating due to re-invigoration of BEP016, in particular bids-standard/bids-bep016#32.

IMO, there are multiple issues with the current description of the inheritance principle:

  1. Some details are reiterated in different ways, which shouldn't be necessary if the principle is stated unambiguously;
  2. Despite the availability & use of filesystem tree examples, many aspects of the principle are described through hypothetical modifications rather than presenting such;
  3. Some instances of point 2 result in reference to either files that don't exist or would need to be renamed in order to satisfy the change suggested;
  4. There is a persistent blurring between official description of the principle that should be interpreted literally, and recommendation / guidance.
  5. Inheritance is described with respect to the lower-level JSON file, rather than with respect to a data file to which multiple JSONs may be applicable.
  6. The logic by which software tools should classify which JSONs are or are not applicable to a given data file is not adequately clear.

Given the breadth of these issues I think a re-write is warranted. I am completely open to proposals for modifications or proposal of outright alternatives, but the ball needs to start rolling somewhere.

Full disclosure: as per #259, I would like to pursue the prospect of modifying the inheritance principle itself, not just the description of such; specifically removal of the preclusion of having multiple applicable JSONs at one level of the hierarchy. However here I am focusing solely on modifying the text describing the principle as it currently stands, which would be applicable to a patch update. Changing the principle itself should IMO be deferred to a minor or potentially major update; in addition, the isolation of that specific aspect and modification thereof should be somewhat facilitated by the clarification of the current state of the principle following these changes.

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Dec 6, 2021

pinging @yarikoptic and @VisLab as they both have expressed interest in this issue in the past

Comment on lines 403 to 412
Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any
directory level. For any given data file, any metadata file at that directory
level or higher that does not include any entities absent from the name of the
data file and possesses the same suffix are applicable to that data file. Such
files are loaded from the top of the directory hierarchy downwards, such that
values from the top level are inherited by all data files at lower levels to
which it is applicable unless overridden by a value for the same key present
in another metadata file at a lower level (though it is RECOMMENDED to minimise
the extent of such overrides). There is no notion of "unsetting" a
key/value pair.
Copy link
Collaborator

@Remi-Gau Remi-Gau Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any
directory level. For any given data file, any metadata file at that directory
level or higher that does not include any entities absent from the name of the
data file and possesses the same suffix are applicable to that data file. Such
files are loaded from the top of the directory hierarchy downwards, such that
values from the top level are inherited by all data files at lower levels to
which it is applicable unless overridden by a value for the same key present
in another metadata file at a lower level (though it is RECOMMENDED to minimise
the extent of such overrides). There is no notion of "unsetting" a
key/value pair.
- Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any directory level.
- For a given data file, any metadata file at that directory level or higher
is applicable to that data file if:
- the metadata and the data filenames possess the same suffix,
- the metadata filename does not include any entity absent from the data filename.
- Such files are loaded from the top of the directory hierarchy downwards,
such that values from the top level are inherited by all data files
at lower levels to which it is applicable unless overridden
by a value for the same key present in another metadata file at a lower level
(though it is RECOMMENDED to minimise the extent of such overrides).
- There is no notion of "unsetting" a key/value pair.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an improvement. However, there is still some confusion because the notion of a 'key' makes sense for json but I don't think it makes sense for tsv files. I have no idea how to apply this to bvec files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd contemplated bullet-pointing this myself; or potentially even enumerating so that datasets violating specific requirements / software code can be cross-referenced more precisely?

@VisLab: Good catch. There's going to need to be a hard split between JSON and other metadata files. For .bvec / .bval / .tsv, it I think needs to be simply selection of whichever applicable file is lowest in the directory tree; anything above it gets ignored, there's no reason to describe the contents of such as being "overwritten" by the content of files lower down. Only edge case that comes to mind would be .tsv files that happen to contain exactly the same columns (including their titles), but even then it's hard to justify choosing between a concatenation and a replacement. Are there any other metadata types that are worth being familiar with for the sake of getting the language / logic of the inheritance principle right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea about using an ordered list and referring to those points in the example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So am I understanding that for JSON the resolution is top-down, with top (closest to the root) having precedence.
For the others it is bottom up, with the bottom (farthest from the root) having precedence?

The notions of "lowest" and "highest" level in the tree isn't necessarily clear. Does lowest mean farthest from the root?

I honestly thought only JSON files got inherited in the sense that there could be multiple applicable ones for the same suffix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only case of "official" of TSV inheritance we have around is having a single task-foo_events.tsv in the root of the directory in case all subjects / runs had the same design:

see from the BIDS examples

╰─⠠⠵ ls -l */*task*tsv
-rw-rw-r-- 1 remi remi  143 Nov 10 11:33 ds114/task-covertverbgeneration_events.tsv
-rw-rw-r-- 1 remi remi  280 Nov 10 11:33 ds114/task-fingerfootlips_events.tsv
-rw-rw-r-- 1 remi remi  143 Nov 10 11:33 ds114/task-overtverbgeneration_events.tsv
-rw-rw-r-- 1 remi remi  127 Nov 10 11:33 ds114/task-overtwordrepetition_events.tsv
-rw-rw-r-- 1 remi remi 1054 Nov 10 11:33 eeg_ds000117/task-facerecognition_channels.tsv
-rw-rw-r-- 1 remi remi 1807 Nov 10 08:42 synthetic/task-nback_events.tsv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to expand the discussion of events inheritance. Inherited events.tsv files are needed at several levels.

For example, because BIDS forces multi-modality data to be separated into separate directories, simultaneously recorded MEG/EEG or fMRI/EEG will have the same event files for each run.

In the following example from hed-examples, EEG and MEG were recorded simultaneously, but separated into separate modality directories.

ds_003654s:
       task-FacePerception_events.json
       sub-02:
               sub-002_task-FacePerception_events.json
               sub-002_task-FacePerception_run-1_events.tsv
               eeg:
                      sub-002_task-FacePerception_run-1_eeg.set 
               meg:
                      sub-002_task-FacePerception_run-1_meg.fif
   . . .

Right now we are assuming that multiple events.json files can be applicable to a given events.tsv files and that the keys are resolved from directory root down in the JSON files. I think this is the right behavior for JSON files, but it needs to be clarified in the specification.

We have only made a single events.tsv applicable to any given recording, but that events.tsv file could appear anywhere in the directory hierarchy.

Events are a special case, and I think it might be feasible/useful to allow multiple events files at different levels to apply to a given run. The events are the union of the rows/join of columns. This might be too complicated.

The issue also comes up with derivatives. Here the derivatives might have their own set of events. If the events from the raw data also apply do they need to always be copied into the events files? There was some discussion of this in the past, but there didn't seem to be a clear resolution or consensus on this when the discussion occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have a general question about inheritance. Is the matching of entity components and their names case sensitive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notions of "lowest" and "highest" level in the tree isn't necessarily clear. Does lowest mean farthest from the root?

I always have to cross-reference elsewhere in the spec to remind myself which is which. Does anyone know if there's a robust reference-able definition somewhere? Or does this need to be defined upfront in the spec?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the matching of entity components and their names case sensitive?

I think the case sensitivity issues was solved in #858

Lestropie and others added 3 commits December 13, 2021 10:11
Co-authored-by: Remi Gau <remi_gau@hotmail.com>
- Change primary rules from dot points to enumerations, and reference rules by number where applicable.
- Move from subsequent text into this list rules relating to improper placement of metadata files within the directory structure, and not permitting multiple applicable files at one level of the hierarchy.
- Add "corollaries" section incorporating text relating to interpretation and consequences of the rules, separating them from subsequent examples.
As the inheritance principle makes reference to tabular files and JSON key-value dictionaries, but there are no such references in the opposite direction, these files should be defined prior to introduction of the Inheritance Principle.
@Lestropie
Copy link
Collaborator Author

Lots of changes (not expecting this to be a quick PR). I think that it is by necessity transitioning from the original text being along the lines of "here's how you should do this thing" to what is really required here for the sake of stringent enforcement and validation being "here's the rules", with the downstream consequences of such expressed in a more user-friendly way separate to the rules themselves (what's currently called "corollaries" here could conceivably be expanded quite a lot).

Note that while @Remi-Gau's change proposal above was merged, I've explicitly unresolved the conversation as there is a lot of content there unrelated to that specific proposal that have not been resolved. Would suggest branching separate chains of thought into separate comment threads tied to the relevant lines; this one is invariably going to be messy, best to mitigate it where we can.

Having a metadata file applicable to multiple data files can occur not only at one level of the directory hierarchy, but additionally to many data files at lower levels of the hierarchy.
@VisLab
Copy link
Member

VisLab commented Dec 13, 2021

I have reviewed the commits and the bullet points make things much clearer. Thank you!

Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like it, well done. IMHO should be taken out of the draft! ;)

Left some minor comments

src/02-common-principles.md Outdated Show resolved Hide resolved
Comment on lines +556 to +557
1. There MUST NOT be multiple metadata files applicable to a data file at one level
of the directory hierarchy.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: it is such a nice concise rule which helps to avoid ambiguity and the workaround in Example 3 is quite cute ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH -- I remembered what bothered me in my original #102 and outlined in Edit 1 there. Citing an example from there:

I placed myself into a corner with an example of having e.g.

sub-1_task-task1_run-1_bold.json and
sub-1_task-task1_acq-X_run-1_bold.json

per subject (should be ok), and then trying to aggregate over them while retaining also _acq- if defined.

and there are many other entities/use-cases which would fall into similar situation. What I am thinking is to add a clarification as subitem here or a separate rule:

  • A metadata file at the data files level of hierarchy MUST not be considered for inheritance if there is a matching in entities data file.

This would make sub-1_task-task1_run-1_bold.json not a to be considered for providing metadata to enrich sub-1_task-task1_acq-X_run-1_bold.json, and examples below would stay valid (we might want to add an example for this rule though)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is such a nice concise rule which helps to avoid ambiguity and the workaround in Example 3 is quite cute ;)

Well, actually (from first post):

Full disclosure: as per #259, I would like to pursue the prospect of modifying the inheritance principle itself, not just the description of such; specifically removal of the preclusion of having multiple applicable JSONs at one level of the hierarchy.

😬

The proposed changes here do at least I think demonstrate what would need to change in order to facilitate such. Any text I propose for such will need to be very clear for both users and software creators exactly what is permitted vs. not permitted and in what order JSONs should be loaded. But as I said in the original post, this is not a change in descriptive language but a change in prescriptive permissible structure, and so IMO necessitates either a minor or major version change rather than just a patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Given the currently loose formulation of the principle, depending on the changes to it, I think it might be feasible to get it done within minor revision. So it would be great to see this PR be finalized/merged soon.

src/02-common-principles.md Show resolved Hide resolved
Resolves bids-standard#946 with bids-standard#962.

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Lestropie added a commit to Lestropie/bids-specification that referenced this pull request Dec 19, 2021
Lestropie added a commit to Lestropie/bids-specification that referenced this pull request Dec 19, 2021
Lestropie added a commit to Lestropie/bids-specification that referenced this pull request Dec 19, 2021
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful work, thanks a lot! I left a few comments

src/02-common-principles.md Outdated Show resolved Hide resolved
src/02-common-principles.md Outdated Show resolved Hide resolved
src/02-common-principles.md Outdated Show resolved Hide resolved
src/02-common-principles.md Outdated Show resolved Hide resolved
src/02-common-principles.md Outdated Show resolved Hide resolved
When reading image `sub-01/func/sub-01_task-rest_acq-default_bold.nii.gz`, only
metadata file `task-rest_bold.json` is read; file
`sub-01/func/sub-01_task-rest_acq-longtr_bold.json` is inapplicable as it contains
entity "`acq-longtr`" that is absent from the image path (rule 2.3). When reading image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule 2.c, see my comment above.

I see that below this would be "5.b.ii" instead of 5.2.2, so maybe we need to discuss my suggestion - as 5.b.ii is kind of ugly.

Copy link
Collaborator Author

@Lestropie Lestropie Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised by 7bc7ad5, but not yet resolving in case the enumeration formatting needs to be discussed further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it again, I don't find it ugly anymore :-) Thanks for making the change.

src/02-common-principles.md Outdated Show resolved Hide resolved
Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
The rule regarding the fact that it is not possible to "unset" a key-value pair from a JSON file from higher in the filesystem hierarchy is here moved to the "corollaries" section. This is because this behaviour is a natural consequence of loading consecutive JSON files using a simple merge operation, and the absence of an equivalent to Python' "None" in the JSON specification.
@Lestropie
Copy link
Collaborator Author

Invested parties please also see Lestropie#3, which proposes an alteration to the contents of this PR.

@sappelhoff
Copy link
Member

@Lestropie is there anything you wanted to discuss or get into this PR? Or should we do a final review?

@Lestropie
Copy link
Collaborator Author

@sappelhoff I'm done tinkering, content as it currently stands. Will have a go at #259 in a separate PR as discussed.

"linkchecker" CI test failed, naively it looks like it might have been a one-off failure that just requires a re-run since it doesn't seem to me to relate to the proposed changes?

@sappelhoff
Copy link
Member

"linkchecker" CI test failed, naively it looks like it might have been a one-off failure that just requires a re-run since it doesn't seem to me to relate to the proposed changes?

That issue was solved in master, so I merged those changes back here - should all be fine now.

I'm done tinkering, content as it currently stands.

awesome, let's try to get a final approval by a bunch of people and merge this!

@VisLab
Copy link
Member

VisLab commented Jan 24, 2022

There is a problem with the bullet point numbering when I do the view file on GitHub. The first set of bullet points uses 2 (with subcategories i, ii, and iii), but the text refers to 2 (a, b, c).

Similarly in the corollaries, rule three refers to "per rule 5.2"

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Two minor issues.


1. It is permissible for a single metadata file to be applicable to multiple data
files at that level of the hierarchy or below. Where such metadata content is consistent
across multiple data files, it is RECOMMENDED to store metadata in this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RECOMMENDED

This is a best practice written in the language of a validatable rule. Some prefer this approach, others prefer to have their sidecars per-data-file. I think we should not make this recommendation, as validation will be tricky and annoying to users who would prefer the alternative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with taking out the recommended here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a best practice written in the language of a validatable rule.

  1. This seems to extend beyond the RFC2119 definition. Is this an established BIDS-specific interpretation that is documented somewhere?

  2. Would the same criticism not also apply to 5.2?

    For JSON files, key-values are loaded from files from the top of the directory hierarchy downwards, such that key-values from the top level are inherited by all data files at lower levels to which it is applicable unless overridden by a value for the same key present in another metadata file at a lower level (though it is RECOMMENDED to minimise the extent of such overrides).

    Issuing a validator warning regarding the presence of individual key-value overrides may be unexpected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an established BIDS-specific interpretation that is documented somewhere?

AFAIK this is not documented anywhere, but among maintainers we have often discussed that RECOMMENDED corresponds to a "warning" level in the validator, and MUST corresponds to an "error" level in the validator. The main point of this consideration is to not design a specification that we cannot reasonably validate.

Having that said, there are many cases in the spec where we RECOMMEND or require (MUST), where validation is currently not happening and might be difficult to implement --- as you correctly point out in your second point.

Personally I am fine with having some recommendations that we cannot "warn" about (see especially 5.b here, but maybe also the point raised by Chris.).

src/02-common-principles.md Outdated Show resolved Hide resolved
src/02-common-principles.md Outdated Show resolved Hide resolved
@Lestropie
Copy link
Collaborator Author

There is a problem with the bullet point numbering when I do the view file on GitHub.

I think the most important correspondence here is going to be not with a GitHub rendering, but with the website / PDF renderings. I couldn't find an existing example; perhaps if someone could duplicate this branch on the main repo, then set up ReadTheDocs to render that branch, we could check what the current behaviour is there?

@sappelhoff
Copy link
Member

perhaps if someone could duplicate this branch on the main repo, then set up ReadTheDocs to render that branch, we could check what the current behaviour is there?

We should have all the needed checks available from the CI:

Lestropie and others added 2 commits February 1, 2022 11:51
Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
@effigies effigies merged commit ec8f98a into bids-standard:master Feb 1, 2022
@sappelhoff
Copy link
Member

Thanks for leading this effort @Lestropie! I think it's a huge improvement in terms of clarity.

Lestropie added a commit to Lestropie/bids-specification that referenced this pull request Feb 7, 2022
Relates to discussion in bids-standard#259, and was also raised in bids-standard#946.
Instead of enforcing a unique order of metadata file inheritance via filesystem hierarchy only, the logic behind the inheritance principle is generalised to permit multiple files per directory whilst still ensuring that the order of metadata file loading is still unique.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants