Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Move rawdata/ into sourcedata/raw in alternative structure example, clarify on naming of datasets themselves #1741

Merged
merged 10 commits into from
Apr 25, 2024
Merged
53 changes: 29 additions & 24 deletions src/common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,12 @@ and/or files (like `events.tsv`) are fully omitted *when they are unavailable or
instead of specified with an `n/a` value, or included as an empty file
(for example an empty `events.tsv` file with only the headers included).

## Dataset naming

BIDS does not prescribe a particular naming scheme for directories containing individual BIDS datasets.
However, it is RECOMMENDED to use a short descriptive name that reflects the content of the dataset, avoid spaces in the name, and use hyphens to separate words.
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved
BIDS datasets embedded within a larger BIDS dataset MAY follow some convention (see for example [Storage of derived datasets](#storage-of-derived-datasets)).

## Filesystem structure

Data for each subject are placed in subdirectories named "`sub-<label>`",
Expand Down Expand Up @@ -248,9 +254,10 @@ recommending a particular naming scheme for including different types of
source data (such as the raw event logs or parameter files, before conversion to BIDS).
However, in the case that these data are to be included:

1. These data MUST be kept in separate `sourcedata` directory with a similar
directory structure as presented below for the BIDS-managed data. For example:
`sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
1. These data MUST be kept in separate `sourcedata` directory.
BIDS does not prescribe a particular naming scheme for source data.
But it is RECOMMENDED for it to follow BIDS naming convention where possible.
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved
For example: `sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
`sourcedata/sub-01/ses-pre/func/MyEvent.sce`.

1. A README file SHOULD be found at the root of the `sourcedata` directory or the
Expand All @@ -267,41 +274,38 @@ A guide for using macros can be found at
-->
{{ MACROS___make_filetree_example(
{
"my_dataset-1": {
"sourcedata": "",
"...": "",
"rawdata": {
"dataset_description.json": "",
"participants.tsv": "",
"my_project-1": {
"sourcedata": {
"dicoms": {},
"raw": {
"sub-01": {},
"sub-02": {},
"...": "",
"dataset_description.json": "",
"...": "",
},
"derivatives": {
"pipeline_1": {},
"pipeline_2": {},
"...": "",
},
"..." : "",
},
"derivatives": {
"pipeline_1": {},
"pipeline_2": {},
"...": "",
}
}
}
) }}

In this example, where `sourcedata` and `derivatives` are not nested inside
`rawdata`, **only the `rawdata` subdirectory** needs to be a BIDS-compliant
dataset.
In this example, `sourcedata/dicoms` is not nested inside
`sourcedata/raw`, **and only the `sourcedata/raw` subdirectory** is a BIDS-compliant dataset among `sourcedata/` subfolders.
The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets
(see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion).
This specification does not prescribe anything about the contents of `sourcedata`
directories in the above example - nor does it prescribe the `sourcedata`,
`derivatives`, or `rawdata` directory names.
The above example is just a convention that can be useful for organizing raw,
source, and derived data while maintaining BIDS compliance of the raw data
directory. When using this convention it is RECOMMENDED to set the `SourceDatasets`
The above example is just a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining BIDS compliance of the raw data directory.
effigies marked this conversation as resolved.
Show resolved Hide resolved
When using this convention it is RECOMMENDED to set the `SourceDatasets`
field in `dataset_description.json` of each subdirectory of `derivatives` to:

```JSON
{
"SourceDatasets": [ {"URL": "../../rawdata/"} ]
"SourceDatasets": [ {"URL": "../../sourcedata/raw/"} ]
}
```

Expand Down Expand Up @@ -380,6 +384,7 @@ Derivatives can be stored/distributed in two ways:
"sub-01": {},
"sub-02": {},
"...": "",
"dataset_description.json": "",
}
}
) }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ SUMMARY:
0 out of 1 files were successfully validated, using the following regular expressions:
- `.*?/sub-(?P<subject>[0-9a-zA-Z]+)/(|ses-(?P<session>[0-9a-zA-Z]+)/)anat/sub-(?P=subject)(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z]+))(|_ce-(?P<ceagent>[0-9a-zA-Z]+))(|_rec-(?P<reconstruction>[0-9a-zA-Z]+))(|_run-(?P<run>[0-9a-zA-Z]+))(|_part-(?P<part>(mag|phase|real|imag)))_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)\.(nii.gz|nii|json)$`
The following files were not matched by any regex schema entry:
* `/home/chymera/.data2/datalad/000026/rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
* `/home/chymera/.data2/datalad/000026/noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
The following mandatory regex schema entries did not match any files:
4 changes: 2 additions & 2 deletions tools/schemacode/bidsschematools/tests/test_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@ def test_write_report(tmp_path):
]
validation_result["path_tracking"] = [
"/home/chymera/.data2/datalad/000026/"
"rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
"noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
]
validation_result["path_listing"] = [
"/home/chymera/.data2/datalad/000026/"
"rawdata/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
"noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz"
]

report_path = tmp_path / "output_bids_validator_xs_write.log"
Expand Down
2 changes: 1 addition & 1 deletion tools/schemacode/bidsschematools/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -594,7 +594,7 @@ def validate_bids(
::

from bidsschematools import validator
bids_paths = '~/.data2/datalad/000026/rawdata'
bids_paths = '~/.data2/datalad/000026/noncompliant'
validator.validate_bids(bids_paths)

Notes
Expand Down