Skip to content

Commit

Permalink
Merge pull request #4 from Lestropie/entity_refined_definition
Browse files Browse the repository at this point in the history
Common principles: More elaborate "entity" definition
  • Loading branch information
Lestropie authored Apr 29, 2022
2 parents 6b81d67 + bb3f70e commit f3df02b
Showing 1 changed file with 103 additions and 65 deletions.
168 changes: 103 additions & 65 deletions src/02-common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,8 @@ misunderstanding we clarify them here.

1. **Sample** - a sample pertaining to a subject such as tissue, primary cell
or cell-free sample.
The `sample-<label>` **entity** is used to distinguish between different
samples from the same subject.
The label MUST be unique per subject and is RECOMMENDED to be unique
throughout the dataset.
Sample labels MUST be unique within a subject and it is RECOMMENDED that
they be unique throughout the dataset.

1. **Data acquisition** - a continuous uninterrupted block of time during which
a brain scanning instrument was acquiring data according to particular
Expand Down Expand Up @@ -105,36 +103,68 @@ misunderstanding we clarify them here.
The modality may overlap with, but should not be confused with
the **data type**.

1. **Entity** - a portion of a file name, consisting of a **key** and corresponding
**value** separated by a hyphen. Supported entities are defined in
[Appendix IX](99-appendices/09-entities.md); further, whether they are allowed,
OPTIONAL or REQUIRED for various data files, as well as their relative ordering,
are defined in the Entity Table in [Appendix IV](99-appendices/04-entity-table.md).

1. **`<index>`** - a nonnegative integer, possibly prefixed with arbitrary number of
0s for consistent indentation, for example, it is `01` in `run-01` following
`run-<index>` specification.

1. **`<label>`** - an alphanumeric value, possibly prefixed with arbitrary
number of 0s for consistent indentation, for example, it is `rest` in `task-rest`
following `task-<label>` specification. Note that labels MUST not collide when
casing is ignored (see [Case collision intolerance](#case-collision-intolerance)).

1. **`suffix`** - an alphanumeric value, located after all **entities** and following
a final `_`, right before the **file extension**; for example, it is `eeg` in
`sub-05_task-matchingpennies_eeg.vhdr`.
1. **Suffix** - an alphanumeric string that forms part of a file name, located
after all [entities](#entities) and following a final `_`, right before the
**file extension**; for example, it is `eeg` in `sub-05_task-matchingpennies_eeg.vhdr`.

1. **File extension** - a portion of the file name after the left-most
period (`.`) preceded by any other alphanumeric. For example, `.gitignore` does
not have a file extension, but the file extension of `test.nii.gz` is `.nii.gz`.
Note that the left-most period is included in the file extension.

1. **DEPRECATED** - A "deprecated" entity or metadata field SHOULD NOT be used in the
1. **DEPRECATED** - A "deprecated" [entity](#entities) or metadata field SHOULD NOT be used in the
generation of new datasets.
It remains in the standard in order to preserve the interpretability of existing datasets.
Validating software SHOULD warn when deprecated practices are detected and provide a
suggestion for updating the dataset to preserve the curator's intent.

## Entities

An "entity" is an attribute that can be associated with a file, contributing
to the identification of that file as a component of its file name in the
form of a hyphen-separated key-value pair.

Each entity has the following attributes:

1. *Name*: A comprehensive name describing the context of information
to be provided via the entity.

1. *Key*: A short string, typically a compression of the entity name,
which uniquely identifies the entity when part of a file name.

1. *Value type*: The requisite form of the value that gets specified
alongside the key whenever the entity appears in a file name.
For each entity, the value is of one of two possible types:

1. *Index*: A non-negative integer, potentially zero-padded for
consistent width.

1. *Label*: An alphanumeric string.
Note that labels MUST not collide when casing is ignored
(see [Case collision intolerance](#case-collision-intolerance)).

The entity *format* is a string that prescribes how the entity appears within
any given file name.
For a hypothetical entity with key "`key`", the format can be either
"`key-<index>`" or "`key-<label>`", depending on the value type of that entity.

An entity *instance* is the specific manifestation of an entity within the
name of a specific file, based on the format of the entity but with a value
that provides identifying information to the particular file in whose name
it appears.

Depending on context, any one of the entity name, key, format, or a specific
entity instance, may be referred to as simply an "entity".

"Subject", "session", "sample", "task", and "run" from the list of definitions
above are all examples of entities.
The comprehensive list of supported entities is defined in
[Appendix IX](99-appendices/09-entities.md);
further, whether each is OPTIONAL, REQUIRED, or MUST NOT be provided for
various data files, as well as their relative ordering in a file name, are
defined in the Entity Table in
[Appendix IV](99-appendices/04-entity-table.md).

## Compulsory, optional, and additional data and metadata

The following standard describes a way of arranging data and writing down
Expand All @@ -153,50 +183,58 @@ The solutions will change from case to case and publicly available datasets will
be reviewed to include common data types in the future releases of the BIDS
specification.

## File name structure

A file name consists of a chain of *entities*, a *suffix* and an *extension*.
Two prominent examples of entities are `subject` and `session`.

For a data file that was collected in a given `session` from a given
`subject`, the file name MUST begin with the string `sub-<label>_ses-<label>`.
If the `session` level is omitted in the folder structure, the file name MUST begin
with the string `sub-<label>`, without `ses-<label>`.

Note that `sub-<label>` corresponds to the `subject` entity because it has
the `sub-` "key" and`<label>` "value", where `<label>` would in a real data file
correspond to a unique identifier of that subject, such as `01`.
The same holds for the `session` entity with its `ses-` key and its `<label>`
value.

The extra session layer (at least one `/ses-<label>` subfolder) SHOULD
be added for all subjects if at least one subject in the dataset has more than
one session.
If a `/ses-<label>` subfolder is included as part of the directory hierarchy,
then the same [`ses-<label>`](./99-appendices/09-entities.md#ses)
entity MUST also be included as part of the file names themselves.
Acquisition time of session can
be defined in the [sessions file](03-modality-agnostic-files.md#sessions-file).

A chain of entities, followed by a suffix, connected by underscores (`_`)
produces a human readable file name, such as `sub-01_task-rest_eeg.edf`.
It is evident from the file name alone that the file contains resting state
data from subject `01`.
The suffix `eeg` and the extension `.edf` depend on the imaging modality and
the data format and indicate further details of the file's contents.

Entities within a file name MUST be unique.
For example, the following file name is not valid because it uses the `acq`
entity twice:
`sub-01_acq-laser_acq-uneven_electrodes.tsv`

In cases where entities duplicate metadata,
the presence of an entity should not be used as a replacement for
## Filesystem structure

Data for each subject are placed in sub-directories named "`sub-<label>`",
where string "`<label>`" is substituted with the unique identification
label of each subject.
Additional information on each participant MAY be provided in a
[participants file](03-modality-agnostic-files.md#participants-file)
in the root directory of the dataset.

If data for the subject were acquired across multiple sessions, then within
the subject directory resides sub-directories named "`ses-<label>`",
where string "`<label>`" is substituted with a unique identification
label for each session.
In datasets where at least one subject has more than one session, this
additional sub-directory later SHOULD be added for all subjects in the dataset.
Additional information on each session MAY be provided in a
[sessions file](03-modality-agnostic-files.md#sessions-file)
within the subject directory.

Within the session sub-directory (or the subject sub-directory if no
session sub-directories are present) are sub-directories named according to
data type as defined above.
A data type directory SHOULD NOT be defined if there are no files to be placed
in that directory.

## File names

A file name consists of a chain of *entity instances* and a *suffix*
all separated by underscores, and an *extension*.
This pattern forms file names that are both human- and machine-readable.
For instance, file "`sub-01_task-rest_eeg.edf`" contains instances of the
"subject" and "task" entities, making it evident from the file name alone that it
contains resting-state data from subject `01`;
the suffix `eeg` and extension `.edf` depend on the imaging modality and the data
format, and can therefore convey further details of the file's contents.

For a data file that was collected in a given session from a given
subject, the file name MUST begin with the string `sub-<label>_ses-<label>`.
Conversely, if the session level is omitted in the folder structure, the file
name MUST begin with the string `sub-<label>`, without `ses-<label>`.

Any given entity MUST NOT appear more than once in any file name. For example,
file name "`sub-01_acq-laser_acq-uneven_electrodes.tsv`" is invalid because
it uses the "acquisition" entity twice.

In cases where an entity and a metadata field convey similar contextual
information, the presence of an entity should not be used as a replacement for
the corresponding metadata field.
For instance, in echo-planar imaging MRI,
the [`dir-<label>`](./99-appendices/09-entities.md#dir) entity MAY be used
For instance, in echo-planar imaging MRI, the
[`dir-<label>`](./99-appendices/09-entities.md#dir) entity MAY be used
to distinguish files with different phase-encoding directions,
but the file's `PhaseEncodingDirection` can only be specified as metadata.
but the file's `PhaseEncodingDirection` MUST be specified as metadata.

A summary of all entities in BIDS and the order in which they MUST be
specified is available in the [entity table](./99-appendices/04-entity-table.md)
Expand Down

0 comments on commit f3df02b

Please sign in to comment.