Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Separate out imaging-specific "common derivatives" #310

Merged
merged 3 commits into from
Sep 12, 2019

Conversation

effigies
Copy link
Collaborator

It has been observed that the "common derivatives" has mostly meant "common across imaging modalities". This PR is an attempt to rectify this situation, in terms of not reasserting the centrality of imaging to BIDS, and in order to ensure that we're not cluttering the namespace and putting non-imaging modalities in a situation where they have to make unintuitive naming choices because the obvious names are all claimed.

With a couple of cleanups, the preprocessed data section of 05-derivatives/02-common-data-types.md seems usable, as the primary definition is "operations that do not change the dimensionality of the data". We currently lack non-imaging examples, but I'm not sure whether it's wise to include outside of the BEP 021 effort, as any decisions that make it into a release have to be respected in future efforts.

Almost everything else is really imaging specific, so I moved it to 05-derivatives/03-imaging.md, with minimal modification.

One thing we need to be careful about is whether an entity (-) definition is being constrained to have a meaning with regard to the imaging modality or constrained to have a meaning that may not apply outside imaging. I think we only introduce space, desc and hemi here, all of which feel (to me) reasonably extensible to other modalities where they might apply or inherently spatial.

One thing I haven't touched is Coordinate systems in 05-derivatives/01-introduction.md. It seems applicable outside imaging, so it feels premature to move it into an imaging-specific section. Some parts might be sensibly moved here, though. Please comment here for portions that seem best moved, and in #265 for portions that should simply be reworded.

If we can keep this PR focused on conceptual clarity, we can reserve polish and typos for when #265 is nearly ready to merge.

@bids-standard/derivatives I would appreciate all of your thoughts. In particular @bids-standard/derivatives-electrophys I would like your input to ensure that this is moving in a direction that works for you.

Copy link
Collaborator Author

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linter doesn't like double blank lines.

src/05-derivatives/02-common-data-types.md Outdated Show resolved Hide resolved
src/05-derivatives/03-imaging.md Outdated Show resolved Hide resolved
src/05-derivatives/03-imaging.md Outdated Show resolved Hide resolved
@franklin-feingold franklin-feingold added this to the Common Derivatives milestone Aug 16, 2019
Copy link
Collaborator

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for some sections purely for imaging (e.g., Anatomical labels), I don't think to split this from common derivatives is a good idea.

We should identify potential name conflicts and misrepresentations of other data types and address them.

(and get rid of SkullStripped, even for MRI makes no sense)

@@ -0,0 +1,266 @@
# Imaging data types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Imaging data types
# N-Dimensional (N>1) digital data

To me, Imaging is used in a sense completely unrelated to data types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does imaging mean to you? The sense in which I'm using it is "at least one dimension (and usually two or three) have a spatial interpretation". I don't know that these things apply to N-D data generally.

Can you clarify your objections?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "N-Dimensional digital data" also applies to raw EEG (space and time), to surround audio (space and time), to photo (2D space), video (2D pixel space and time), eye tracker (which eye, pupil versus gaze, and time), polhemus electrode positions (space), motion capture (which body part, and time), touch screen events in behavioral data (space and time). The subsequent text does not provide guidelines how to deal with those data types. It does not even specify how the temporal dimension of MRI is dealt with, only deals with the different spatial representations of MRI data (volumetric, surface-based, parcellated).

Surface-based and parcellated descriptions of data are by themselves not images any more, but are derived from imaging data. That is why I consider "Imaging data types" an appropriate title. Or perhaps "Data types derived from images".

Interestingly, "Data types derived from images" would also apply (in case BEP021 moves further ahead) to volume conduction models of the head and source models, used in EEG and MEG pipelines, and the functional estimates that are obtained for those source models (similar to how we store HCP MEG results in CIFTI). These results are multi-modal, since the input is anatomical MRI and electrophys.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@effigies sorry for the lack in clarity. What I mean is that traversing over the whole taxonomy of the dot product of data types and modalities seems impractical.

I agree common-derivatives needs to be extra-careful not to impose some particular modality idiosyncrasy to the rest of them. To address that, the original BIDS spec only splits by modalities in a very broad sense and then BEPs followed that approach. I just can't see why this extra granularity helps to get common-derivatives merged faster or helps to better describe derivatives. IMHO it will be much harder to keep the consistency across BEPs, and also, this increases communication and review costs at no clear benefit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is clear that Imaging data types is a contended title because it seems to be restricted to spatial data (possibly spatio-temporal) instead of clarifying that it rather focuses on what can be written under the anat|func|dwi/ folders. It also contains "data types", which I comment next.

Then, Data types derived from images is problematic because data types should be described independently of how those data were generated.

Take for instance a file object to describe a spatial transform of interest, or a statistical map. I don't think it would be a good idea to have different naming conventions and file formats depending on whether the transform file (or the statistical map) is under a func/ or an anat/ or a meg/ folder. Similar reasoning holds for a GIFTI file with data sampled on surfaces.

Finally, I would agree on a title like Imaging derivatives because it doesn't allude to "data types", so data types (e.g., data non-regularly sampled on a surface) shared across several "modalities" (say anat|func|dwi|meg/) should be then described at the top level.

In other words, I think this scoping effort is okay, as long as we are very explicit and talk about what is BIDS-valid if written under each datatype/modality folder. However, I believe that the usefulness of scoping will be limited to just defining metadata fields that can be present (e.g., SkullStripped) - and which tap into provenance... (we know BIDS is not very concerned with provenance).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The derivatives should also allow the representation and description of processed physio data, which can be (on its own) in the beh folder according to the current spec. Physio can for example be GSR, ECG or optical heart signals, in relation to events. So rather than N>1 it should be N=>1.

And how about processed behavioral data represented in the (non-continuous) events? Is it appropriate to describe irregularly sampled data as N-D?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The derivatives should also allow the representation and description of processed physio data, which can be (on its own) in the beh folder according to the current spec. Physio can for example be GSR, ECG or optical heart signals, in relation to events. So rather than N>1 it should be N=>1.

And how about processed behavioral data represented in the (non-continuous) events? Is it appropriate to describe irregularly sampled data as N-D?

Sorry, I can't understand what you are suggesting here. You would not agree with Imaging derivates as the main header for this split?

Copy link
Collaborator

@robertoostenveld robertoostenveld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I am happy with the suggested changes. I made some suggestions for improvements.

Not specific for this PR: I think that "to avoid name conflict with the raw file" is an important consideration for the common derivative: the common principle is that derived files MUST have different names than raw files, and some keys (space, desc, proc?) are given that MAY be used.

src/05-derivatives/02-common-data-types.md Show resolved Hide resolved
- Motion-corrected DWI files.
- Motion-corrected, temporally denoised, and transformed to MNI space BOLD series
- Inhomogeneity corrected and skull stripped T1w files
- Motion-corrected DWI files
- Time-domain filtered and ICA cleaned EEG data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split "Time-domain filtering for EEG data" from "Spatial filtering, e.g. ICA cleaned EEG data" in the example, since these deal with different dimensions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the application of Maxfilter on MEG data could also be considered an example of preprocessing in which the dimensions don't change. Maxfilter is mentioned in the main spec here and specifies the proc keyword (e.g. proc-sss) to be used in the derivatives.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proc-sss... This is unfortunate. I don't really know what to do with derivatives specified prior to the derivatives spec.

coordinate systems or registered to different reference maps.
The `desc` keyword is a general purpose field with freeform values.
To distinguish between multiple different versions of processing for the same
input data the `desc` keyword should be used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing here is not specific enough. It states that 'spaceis recommended (I find that OK), and it states thatdesc` simply exists (i.e. not recommended, required or whatever).

To distinguish between ... keyword should be used.

Is that "should" a "SHOULD" according to RFC 2119?

Is it required/recommended/suggested (i.e. MUST/SHOULD/MAY) to use either one of desc or space? What about the existing specification of proc?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this comment can be ignored in relation to the specific PR; I guess it is better discussed in the top-level common-derivatives PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a brief part on what desc is an abbreviation for (=description?). This helps the reader with grasping and remembering the entity

Note that even though `space` and `desc` are
optional at least one of them needs to be defined to avoid name conflict with
the raw file.
Note that even though `space` and `desc` are optional at least one of them needs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rigidity of "needs" is not clear wrt RFC2119.

@effigies effigies mentioned this pull request Aug 22, 2019
5 tasks
@@ -11,24 +11,23 @@ Template:
<source_keywords>[_space-<space>][_desc-<label>]_<suffix>.<ext>
```

Processing in this context means transformations of data that does not change
Preprocessing in this context means transformations of data that do not change
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bold reference (average across time of a 4D dataset) changes the number of dimensions - that means a bold reference is not conceived as a preprocessing derivative?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a "preprocessed BOLD file", correct. It's still a derivative, but needs to be treated separately from a preprocessed bold, dwi, T1w, eeg, etc. file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the rationale, but I guess I should raise the comment after this PR is merged, as this is just a re-organization.


## Preprocessed, coregistered and/or resampled volumes

Volumetric preprocessing does not modify the number of dimensions, and so
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before, volumetric preprocessing may modify the number of dimensions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the common-derivatives branch. Please dismiss this comment.

sub-001_dparc.dlabel.nii
```

### Anatomical Labels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this go within the structural derivatives BEP?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I think there was some confusion between whether labels (i.e. atlases in masks) were relevant to info for the functional and dwi pipelines and therefore needed to be considered common principles. (Although I

But I think your right. It makes more sense to logically separate Anatomical and everything that specifically does in an "anat" derivatives folder into the structural derivatives BEP.

One thing that I don't know how clear we are about if how link this info to derived files that related to these masks (i.e. parcellated functional data)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to move structural out... I think during the first big merge, the structural stuff was considered so integral to working with functional and diffusion (e.g., resampling ROIs into the bold/dwi images' spaces) that they became effectively "modality agnostic". But sure. Does it make sense to make that a separate PR or just reorganize this one a bit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR 👍

src/05-derivatives/03-imaging.md Show resolved Hide resolved
@@ -0,0 +1,266 @@
# Imaging data types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is clear that Imaging data types is a contended title because it seems to be restricted to spatial data (possibly spatio-temporal) instead of clarifying that it rather focuses on what can be written under the anat|func|dwi/ folders. It also contains "data types", which I comment next.

Then, Data types derived from images is problematic because data types should be described independently of how those data were generated.

Take for instance a file object to describe a spatial transform of interest, or a statistical map. I don't think it would be a good idea to have different naming conventions and file formats depending on whether the transform file (or the statistical map) is under a func/ or an anat/ or a meg/ folder. Similar reasoning holds for a GIFTI file with data sampled on surfaces.

Finally, I would agree on a title like Imaging derivatives because it doesn't allude to "data types", so data types (e.g., data non-regularly sampled on a surface) shared across several "modalities" (say anat|func|dwi|meg/) should be then described at the top level.

In other words, I think this scoping effort is okay, as long as we are very explicit and talk about what is BIDS-valid if written under each datatype/modality folder. However, I believe that the usefulness of scoping will be limited to just defining metadata fields that can be present (e.g., SkullStripped) - and which tap into provenance... (we know BIDS is not very concerned with provenance).

@effigies
Copy link
Collaborator Author

effigies commented Sep 4, 2019

@robertoostenveld I think I've addressed your comments, if you'd like to have another look.

@oesteban Is there a practical suggestion you're making, wrt #310 (comment)? I'm not seeing how to operationalize it. Is it some text you're looking for or a reorganization?

@edickie @oesteban Regarding the anatomical stuff, we have masks, dsegs and probsegs in here. While segmentations/parcellations are anatomically based, the idea here was that you might want them in other spaces, which made them "common". We can pull them out, which would leave us with masks. Let me know how you want to proceed.

@effigies
Copy link
Collaborator Author

effigies commented Sep 4, 2019

Also, the linkchecker failure is a result of https://loni.usc.edu/ letting their cert expire.

@effigies
Copy link
Collaborator Author

effigies commented Sep 6, 2019

@robertoostenveld @oesteban @edickie Just a bump on this. Sorry to bug you, but I want to try to keep this moving. If you're busy, can you give me a rough time before I should check back in?

@@ -0,0 +1,266 @@
# Imaging data types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The derivatives should also allow the representation and description of processed physio data, which can be (on its own) in the beh folder according to the current spec. Physio can for example be GSR, ECG or optical heart signals, in relation to events. So rather than N>1 it should be N=>1.

And how about processed behavioral data represented in the (non-continuous) events? Is it appropriate to describe irregularly sampled data as N-D?

Sorry, I can't understand what you are suggesting here. You would not agree with Imaging derivates as the main header for this split?


## Preprocessed, coregistered and/or resampled volumes

Volumetric preprocessing does not modify the number of dimensions, and so
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the common-derivatives branch. Please dismiss this comment.

sub-001_dparc.dlabel.nii
```

### Anatomical Labels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR 👍

@effigies
Copy link
Collaborator Author

Going to go ahead and merge this. Further discussion can happen on #265.

@effigies effigies merged commit 10a9a71 into bids-standard:common-derivatives Sep 12, 2019
@effigies effigies deleted the common-imaging branch September 12, 2019 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants