Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rawdata/ is described in text but not part of the schema #1736

Closed
yarikoptic opened this issue Mar 21, 2024 · 8 comments · Fixed by #1741
Closed

rawdata/ is described in text but not part of the schema #1736

yarikoptic opened this issue Mar 21, 2024 · 8 comments · Fixed by #1741
Labels
consistency Spec is (potentially) inconsistent

Comments

@yarikoptic
Copy link
Collaborator

Inspired by

with some relevant prior discussion found in

apparently only the text talks about rawdata/ while src/schema does not have references beyond mentioning in the tests (@TheChymera could you clarify on either it is specific to rawdata/ there or could be another path):

❯ git grep rawdata
src/CHANGES.md:-   \[FIX] typo in "rawdata" example [#1045](https://github.com/bids-standard/bids-specification/pull/1045) ([sappelhoff](https://github.com/sappelhoff))
src/common-principles.md:            "rawdata": {
src/common-principles.md:`rawdata`, **only the `rawdata` subdirectory** needs to be a BIDS-compliant
src/common-principles.md:`derivatives`, or `rawdata` directory names.
src/common-principles.md:  "SourceDatasets": [ {"URL": "../../rawdata/"} ]
tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log:   * `/home/chymera/.data2/datalad/000026/rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
tools/schemacode/bidsschematools/tests/test_validator.py:        "rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
tools/schemacode/bidsschematools/tests/test_validator.py:        "rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
tools/schemacode/bidsschematools/validator.py:        bids_paths = '~/.data2/datalad/000026/rawdata'

IMHO we should fix src/schema to "specify/allow" for rawdata/ in BIDS derivative datasets (not in raw)

@effigies
Copy link
Collaborator

I see: You want sourcedata/ to be actual pre-BIDS and rawdata/ to be a BIDS dataset inside a BIDS-derivative dataset.

This feels like an unnecessary complication. sourcedata/ and derivatives/ are always relative to the current dataset and mark a boundary. That feels like enough, and specifying some directories that serve the same purpose for different dataset types does not sound fun to try to express in schema and implement in a validator.

Also, in practice, many derivatives may have multiple "sources", such as the raw dataset and some preprocessing, e.g.,

ds000001-fmriprep/  # BIDS Derivative
    sourcedata/
        ds000001/  # BIDS
        freesurfer/  # Neither BIDS nor BIDS derivative, but still an input to my derivative

If we change sourcedata/ds000001 to rawdata/ what should we do with sourcedata/freesurfer?

@yarikoptic
Copy link
Collaborator Author

I see: You want sourcedata/ to be actual pre-BIDS and rawdata/ to be a BIDS dataset inside a BIDS-derivative dataset

not really -- I am just trying to ensure that schema matches the text we have. We should fix one or another. If not in BIDS 1.0, we should likely remove rawdata/ in BIDS 2.0.

I am with you generally that sourcedata/ should be enough and preferable for the reasons you stated. FWIW

@effigies
Copy link
Collaborator

So you want to remove text saying that you can place a raw BIDS dataset next to its source and derivatives, instead of nested? I don't understand why this example is a problem.

@yarikoptic
Copy link
Collaborator Author

I want standard to not use in examples arbitrary folder names on the top level which are not described by the standard. So I think we either

Otherwise that example is a "bad example" to be in a standard. We should not give examples with arbitrary folders being populated with anything, unless we allow for that (e.g. as subfolders of code/, sourcedata/ etc) .

@TheChymera
Copy link
Collaborator

TheChymera commented Mar 21, 2024

I haven't looked much into subdirectories, since I think it's really bad to have them nested, and always try to keep them separate. Why not just have independent datasets which reference their parents in the dataset_description.json file? It seems so much simpler, and would also solve the many-to-many correspondence between raw and derivatives. That would also be coherent with not constraining names for top-level directories, which I also think is a good idea.

@yarikoptic
Copy link
Collaborator Author

@TheChymera thanks, but could you clarify on either it is specific to rawdata/ there or could be another path in the test?

@TheChymera
Copy link
Collaborator

TheChymera commented Mar 21, 2024

@yarikoptic not sure I understand the question, but are you asking whether that's a random string for which “rawdata” just happens to be the value? If so, yes:

running test + diff with random other name used
[deco]~/src/bids-specification/tools/schemacode ❱ git rev-parse HEAD
6d13c807c7c8d594a35d302d9aa6cff7c802f054
[deco]~/src/bids-specification/tools/schemacode ❱ pytest bidsschematools/tests/test_validator.py
=========================================================== test session starts ============================================================
platform linux -- Python 3.11.8, pytest-7.4.4, pluggy-1.4.0
rootdir: /home/chymera/src/bids-specification/tools/schemacode
configfile: pyproject.toml
plugins: mock-3.12.0, pyfakefs-5.3.5, rerunfailures-14.0, anyio-4.2.0, pkgcore-0.12.24
collected 20 items

bidsschematools/tests/test_validator.py ....................                                                                         [100%]

============================================================ 20 passed in 4.34s ============================================================
[deco]~/src/bids-specification/tools/schemacode ❱ ag rawdata -l0 | xargs -0 sed -i -e "s/rawdata/tralaladata/g"
[deco]~/src/bids-specification/tools/schemacode ❱ pytest bidsschematools/tests/test_validator.py
=========================================================== test session starts ============================================================
platform linux -- Python 3.11.8, pytest-7.4.4, pluggy-1.4.0
rootdir: /home/chymera/src/bids-specification/tools/schemacode
configfile: pyproject.toml
plugins: mock-3.12.0, pyfakefs-5.3.5, rerunfailures-14.0, anyio-4.2.0, pkgcore-0.12.24
collected 20 items

bidsschematools/tests/test_validator.py ....................                                                                         [100%]

============================================================ 20 passed in 5.20s ============================================================
[deco]~/src/bids-specification/tools/schemacode ❱ git --no-pager diff
diff --git a/tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log b/tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log
index 4d410c3c..a8533d90 100644
--- a/tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log
+++ b/tools/schemacode/bidsschematools/tests/data/expected_bids_validator_xs_write.log
@@ -3,5 +3,5 @@ SUMMARY:
 0 out of 1 files were successfully validated, using the following regular expressions:
 	- `.*?/sub-(?P<subject>[0-9a-zA-Z]+)/(|ses-(?P<session>[0-9a-zA-Z]+)/)anat/sub-(?P=subject)(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z]+))(|_ce-(?P<ceagent>[0-9a-zA-Z]+))(|_rec-(?P<reconstruction>[0-9a-zA-Z]+))(|_run-(?P<run>[0-9a-zA-Z]+))(|_part-(?P<part>(mag|phase|real|imag)))_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)\.(nii.gz|nii|json)$`
 The following files were not matched by any regex schema entry:
-	* `/home/chymera/.data2/datalad/000026/rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
+	* `/home/chymera/.data2/datalad/000026/tralaladata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
 The following mandatory regex schema entries did not match any files:
diff --git a/tools/schemacode/bidsschematools/tests/test_validator.py b/tools/schemacode/bidsschematools/tests/test_validator.py
index fd808fb7..37ae0423 100644
--- a/tools/schemacode/bidsschematools/tests/test_validator.py
+++ b/tools/schemacode/bidsschematools/tests/test_validator.py
@@ -64,11 +64,11 @@ def test_write_report(tmp_path):
     ]
     validation_result["path_tracking"] = [
         "/home/chymera/.data2/datalad/000026/"
-        "rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
+        "tralaladata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
     ]
     validation_result["path_listing"] = [
         "/home/chymera/.data2/datalad/000026/"
-        "rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
+        "tralaladata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
     ]

     report_path = tmp_path / "output_bids_validator_xs_write.log"
diff --git a/tools/schemacode/bidsschematools/validator.py b/tools/schemacode/bidsschematools/validator.py
index 34621a8c..5a73bca3 100644
--- a/tools/schemacode/bidsschematools/validator.py
+++ b/tools/schemacode/bidsschematools/validator.py
@@ -594,7 +594,7 @@ def validate_bids(
     ::

         from bidsschematools import validator
-        bids_paths = '~/.data2/datalad/000026/rawdata'
+        bids_paths = '~/.data2/datalad/000026/tralaladata'
         validator.validate_bids(bids_paths)

     Notes

edit: @yarikoptic took liberty to wrap long example/diff into <details></details>

@yarikoptic
Copy link
Collaborator Author

If so, yes:

"yes" as "it was a random choice and could be anything else", correct?

Then we should also fix the test if we decide that rawdata is part of the specification/legit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistency Spec is (potentially) inconsistent
Projects
None yet
4 participants