[FIX] Move `rawdata/` into `sourcedata/raw` in alternative structure example, clarify on naming of datasets themselves #1741

yarikoptic · 2024-03-21T20:35:01Z

Individual commits have more rationale.
I can split into multiple PRs but let's see - may be this could provide us a closure.
Reflecting my thoughts on

https://github.com/bids-standard/bids-specification/pull/1734/files#r1534475631

…dataset to be BIDS dataset This is my take on an extended discussion about ambiguity of `rawdata/` example: https://github.com/bids-standard/bids-specification/pull/1734/files#r1534475631

Prior one bundled naming aspect under the same MUST. I separated into separate sentences, added explicit statement that BIDS does not prescribe a particular naming scheme for source data. And added explicit RECOMMENDED on the example how to organize/name files there.

…y a BIDS dataset

…mpliant' === Do not change lines below === { "chain": [], "cmd": "sed -i -e s,rawdata,noncompliant,g tools/schemacode/bidsschematools/validator.py tools/schemacode/bidsschematools/tests/test_validator.py tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^

codecov · 2024-03-21T20:47:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.79%. Comparing base (4c642bd) to head (1c029c6).
Report is 53 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1741      +/-   ##
==========================================
- Coverage   87.93%   87.79%   -0.15%     
==========================================
  Files          16       16              
  Lines        1351     1360       +9     
==========================================
+ Hits         1188     1194       +6     
- Misses        163      166       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Remi-Gau · 2024-03-22T12:46:57Z

What the example looks like

Remi-Gau

I am good with this approach. Hopefully this should prevent confusion.

yarikoptic · 2024-03-22T22:10:24Z

What the example looks like

anal me notices now that dataset_description.json in that example is "inconsistent" it its ordering with directories...

apparently we do not have full consistency but mostly list it AFTER the folders... but then logically it might need ... before if there are sub- folders, and after to signal that more files are possible...

src/appendices/qmri.md-   {
src/appendices/qmri.md-    "ds-example": {
src/appendices/qmri.md-        "derivatives": {
src/appendices/qmri.md-            "qMRLab": {
src/appendices/qmri.md:                "dataset_description.json": "",
src/appendices/qmri.md-                "sub-01": {
src/appendices/qmri.md-                    "anat": {
src/appendices/qmri.md-                        "sub-01_T1map.nii.gz": "",
src/appendices/qmri.md-                        "sub-01_T1map.json": "",
--
src/common-principles.md-    "my_dataset-1": {
src/common-principles.md-        "sourcedata": {
src/common-principles.md-            "dicoms": {},
src/common-principles.md-            "raw": {
src/common-principles.md:                "dataset_description.json": "",
src/common-principles.md-                "sub-01": {},
src/common-principles.md-                "sub-02": {},
src/common-principles.md-                "...": "",
src/common-principles.md-            },
--
src/common-principles.md-            "pipeline_1": {},
src/common-principles.md-            "pipeline_2": {},
src/common-principles.md-            "...": "",
src/common-principles.md-        },
src/common-principles.md:        "dataset_description.json": "",
src/common-principles.md-        "...": "",
src/common-principles.md-    }
src/common-principles.md-   }
src/common-principles.md-) }}
--
src/common-principles.md-            },
src/common-principles.md-            "sub-01": {},
src/common-principles.md-            "sub-02": {},
src/common-principles.md-            "...": "",
src/common-principles.md:            "dataset_description.json": "",
src/common-principles.md-            }
src/common-principles.md-        }
src/common-principles.md-    ) }}
src/common-principles.md-
--
src/common-principles.md-        },
src/common-principles.md-    "derivatives": {},
src/common-principles.md-    "README": "",
src/common-principles.md-    "participants.tsv": "",
src/common-principles.md:    "dataset_description.json": "",
src/common-principles.md-    "CHANGES": "",
src/common-principles.md-    }
src/common-principles.md-) }}
src/common-principles.md-
--
src/derivatives/common-data-types.md-    "raw/": {
src/derivatives/common-data-types.md-        "CHANGES": "",
src/derivatives/common-data-types.md-        "README": "",
src/derivatives/common-data-types.md-        "channels.tsv": "",
src/derivatives/common-data-types.md:        "dataset_description.json": "",
src/derivatives/common-data-types.md-        "participants.tsv": "",
src/derivatives/common-data-types.md-            "sub-001": {
src/derivatives/common-data-types.md-                "eeg": {
src/derivatives/common-data-types.md-                    "sub-001_task-listening_events.tsv": "",
--
src/longitudinal-and-multi-site-studies.md-            },
src/longitudinal-and-multi-site-studies.md-        "sub-control01_sessions.tsv": "",
src/longitudinal-and-multi-site-studies.md-        },
src/longitudinal-and-multi-site-studies.md-    "participants.tsv": "",
src/longitudinal-and-multi-site-studies.md:    "dataset_description.json": "",
src/longitudinal-and-multi-site-studies.md-    "README": "",
src/longitudinal-and-multi-site-studies.md-    "CHANGES": "",
src/longitudinal-and-multi-site-studies.md-    }
src/longitudinal-and-multi-site-studies.md-) }}

I will at least unify for that now in a separate commit (easy to drop if we decide not worth it)

yarikoptic · 2024-03-28T17:26:37Z

note: failed to upload coverage, I restarted that pipeline to get it hopefully green.

yarikoptic · 2024-04-13T15:12:10Z

yes, it is all green. Waiting for more blessings. @effigies @tsalo -- WDYT?

effigies · 2024-04-16T20:52:58Z

This feels more confusing to me.

└─ my_dataset-1/
   ├─ sourcedata/
   │  ├─ dicoms/
   │  ├─ raw/
   │  │  ├─ sub-01/
   │  │  ├─ sub-02/
   │  │  ├─ ... 
   │  │  └─ dataset_description.json 
   │  └─ ... 
   ├─ derivatives/
   │  ├─ pipeline_1/
   │  ├─ pipeline_2/
   │  └─ ... 
   ├─ dataset_description.json 
   └─ ...

Why is there now a dataset_description.json? What are the contents of this outer dataset apart from sourcedata/ and derivatives/?

I still don't understand why what is present is confusing, but maybe it would be better just to remove it. I would like to move to a Diataxis approach, where we clearly separate out the helpful hints and how-tos from the reference material. Maybe it could be revived in a how-to guide in a way that causes fewer problems, if there's any value to it.

effigies

Some nit-picks, if the consensus is that this is an improvement and gets cleared to merge. I don't think we want to imply validation of these new bits.

src/common-principles.md

yarikoptic · 2024-04-17T12:52:49Z

└─ my_dataset-1/
...
Why is there now a dataset_description.json? What are the contents of this outer dataset apart from sourcedata/ and derivatives/?

IMHO in BIDS specification we SHOULD only talk about BIDS datasets. We SHOULD NOT talk about arbitrary conventions which could exist outside of BIDS specification. Hence IMHO this example SHOULD be a BIDS dataset as well -- AFAIK nothing in BIDS specification would state that it is an illegitimate BIDS dataset, or am I wrong?

I still don't understand why what is present is confusing, but maybe it would be better just to remove it. I would like to move to a Diataxis approach, where we clearly separate out the helpful hints and how-tos from the reference material. Maybe it could be revived in a how-to guide in a way that causes fewer problems, if there's any value to it.

IMHO it is useful to give users guidance. My problem with current formulation is that it is misleading. This PR IMHO clarifies it, unless it is indeed "not BIDS compliant".

effigies · 2024-04-17T14:01:18Z

Hence IMHO this example SHOULD be a BIDS dataset as well -- AFAIK nothing in BIDS specification would state that it is an illegitimate BIDS dataset, or am I wrong?

I believe the validator would currently complain that there are no validatable files. You can argue that's wrong, but from an OpenNeuro perspective, I would want to reject such a dataset as it's effectively a blanket .bidsignore. But we would also reject what you're replacing, since it wasn't a BIDS dataset, but a layout demonstrating one way to include BIDS datasets alongside their sources and derivatives.

IMHO it is useful to give users guidance. My problem with current formulation is that it is misleading. This PR IMHO clarifies it, unless it is indeed "not BIDS compliant".

I find it unhelpful at best, as it drops everything but a dataset_description.json and two opaque directories that could potentially hold nested datasets. This is why I think that a how-to guide would be better, where it discusses these options in detail, and not normatively.

yarikoptic · 2024-04-17T14:14:58Z

there are no validatable files

wrong. There is dataset_description.json. Might be a bug in bids-validator then ;-)

You can argue that's wrong, but from an OpenNeuro perspective, I would want to reject such a dataset as it's effectively a blanket .bidsignore.

as we discussed -- this sounds like an archive specific desire/behavior.

But we would also reject what you're replacing, since it wasn't a BIDS dataset, but a layout demonstrating one way to include BIDS datasets alongside their sources and derivatives.

ok, let's meet half-way: what if I remove dataset_specification.json for now , and mostly revert that added statement that this is a valid bids dataset? and then propose separate follow up PR on that particular aspect since I think we would want also some explicit DatasetType = "study" or alike (e.g. "project" to possibly avoid conflict with BEP035) for such cases.

Then we stay with "convention"

effigies · 2024-04-17T14:32:12Z

Sure, if we drop dataset_description.json and ... and do not say or imply that the outer dataset is a BIDS dataset, then this is basically the same as before. I suppose that's fine.

…aset

src/common-principles.md

…underscores too

yarikoptic · 2024-04-17T19:24:27Z

To expedite and ease re-reviewing -- I committed that final form of suggestions .

src/common-principles.md

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

sappelhoff · 2024-04-18T20:44:27Z

This is what the example now looks like:

https://bids-specification--1741.org.readthedocs.build/en/1741/common-principles.html#source-vs-raw-vs-derived-data

I personally find it confusing that the BIDS raw data is now listed under sourcedata, because the way "people" usually think about the directories is:

sourcedata (what ever was recorded)
BIDS formatted raw data (converted from sourcedata)
derivatives (obtained from BIDS formatted raw data)

nesting "2" under "1" is technically possible, but I find it unhelpful/weird.

Having that said, @Remi-Gau and @effigies seem to have discussed this at length / thought about it, so I will defer to them.

yarikoptic · 2024-04-18T22:43:48Z

there is no 'ideal' setup per se, but if datalad used, then 2. (sourcedata/raw) could also have (possibly subset of only relevant ones) of sourcedata/ (like sourcedata/dicoms) be installed into its own sourcedata/raw/sourcedata . The idea here is just to make it all "flat" and separated only to sourcedata/ (bids or not) and derivatives/ (what computed from those).

yarikoptic · 2024-04-25T17:46:18Z

@effigies @sappelhoff so are we go on this one or more tune ups?

This reverts commit a3c12f8 where I have tried to introduce it in bids-standard#1741 but it required a little more of further detailing.

yarikoptic requested review from tsalo, erdalkaraca and DimitriPapadopoulos as code owners March 21, 2024 20:35

yarikoptic mentioned this pull request Mar 21, 2024

[ENH] clarify that BIDS does not specify names of directories containing datasets #1734

Closed

yarikoptic force-pushed the rf-norawdata branch from be9bdbf to 191c32c Compare March 21, 2024 20:38

yarikoptic added 5 commits March 21, 2024 16:43

RF: move rawdata/ to sourcedata/raw in an example + make overall …

16d57d3

…dataset to be BIDS dataset This is my take on an extended discussion about ambiguity of `rawdata/` example: https://github.com/bids-standard/bids-specification/pull/1734/files#r1534475631

Add one dataset_description.json into an example to make it explicitl…

758c3f6

…y a BIDS dataset

My take on dataset naming common principle

2f018e5

yarikoptic force-pushed the rf-norawdata branch from 191c32c to 7cae13d Compare March 21, 2024 20:43

Do not use e.g.

6ac34ea

Remi-Gau approved these changes Mar 22, 2024

View reviewed changes

Remi-Gau requested review from effigies and sappelhoff March 22, 2024 12:48

Move dataset_description.json in the example to be listed after folders

526374e

yarikoptic mentioned this pull request Mar 30, 2024

Use "dataset" not "project" for intro description of "Folders and Files" bids-standard/bids-starter-kit#400

Merged

effigies reviewed Apr 16, 2024

View reviewed changes

src/common-principles.md Outdated Show resolved Hide resolved

src/common-principles.md Outdated Show resolved Hide resolved

Remove the notion that example layout can in fact be a valid BIDS dat…

a3c12f8

…aset

effigies reviewed Apr 17, 2024

View reviewed changes

src/common-principles.md Show resolved Hide resolved

Use lower case "recommended" as not part of BIDS spec, and recommend …

5974b0e

…underscores too

effigies approved these changes Apr 17, 2024

View reviewed changes

src/common-principles.md Outdated Show resolved Hide resolved

Make into a single sentence

1c029c6

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

effigies changed the title ~~[FIX] Move rawdata/ into sourcedata/raw, ensure example is a BIDS-compliant dataset, clarify on naming of datasets themselves~~ [FIX] Move rawdata/ into sourcedata/raw in alternative structure example, clarify on naming of datasets themselves Apr 25, 2024

effigies merged commit 90ec07f into bids-standard:master Apr 25, 2024
25 of 26 checks passed

dipterix mentioned this pull request Aug 4, 2024

rawdata & root (top-level) BIDS dataset #1882

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Move `rawdata/` into `sourcedata/raw` in alternative structure example, clarify on naming of datasets themselves #1741

[FIX] Move `rawdata/` into `sourcedata/raw` in alternative structure example, clarify on naming of datasets themselves #1741

yarikoptic commented Mar 21, 2024 •

edited by Remi-Gau

Loading

codecov bot commented Mar 21, 2024 •

edited

Loading

Remi-Gau commented Mar 22, 2024

Remi-Gau left a comment

yarikoptic commented Mar 22, 2024

yarikoptic commented Mar 28, 2024

yarikoptic commented Apr 13, 2024

effigies commented Apr 16, 2024

effigies left a comment

yarikoptic commented Apr 17, 2024

effigies commented Apr 17, 2024

yarikoptic commented Apr 17, 2024

effigies commented Apr 17, 2024

yarikoptic commented Apr 17, 2024

sappelhoff commented Apr 18, 2024

yarikoptic commented Apr 18, 2024

yarikoptic commented Apr 25, 2024

[FIX] Move rawdata/ into sourcedata/raw in alternative structure example, clarify on naming of datasets themselves #1741

[FIX] Move rawdata/ into sourcedata/raw in alternative structure example, clarify on naming of datasets themselves #1741

Conversation

yarikoptic commented Mar 21, 2024 • edited by Remi-Gau Loading

codecov bot commented Mar 21, 2024 • edited Loading

Codecov Report

Remi-Gau commented Mar 22, 2024

Remi-Gau left a comment

Choose a reason for hiding this comment

yarikoptic commented Mar 22, 2024

yarikoptic commented Mar 28, 2024

yarikoptic commented Apr 13, 2024

effigies commented Apr 16, 2024

effigies left a comment

Choose a reason for hiding this comment

yarikoptic commented Apr 17, 2024

effigies commented Apr 17, 2024

yarikoptic commented Apr 17, 2024

effigies commented Apr 17, 2024

yarikoptic commented Apr 17, 2024

sappelhoff commented Apr 18, 2024

yarikoptic commented Apr 18, 2024

yarikoptic commented Apr 25, 2024

[FIX] Move `rawdata/` into `sourcedata/raw` in alternative structure example, clarify on naming of datasets themselves #1741

[FIX] Move `rawdata/` into `sourcedata/raw` in alternative structure example, clarify on naming of datasets themselves #1741

yarikoptic commented Mar 21, 2024 •

edited by Remi-Gau

Loading

codecov bot commented Mar 21, 2024 •

edited

Loading