Skip to content

Commit

Permalink
[ENH] Render valid value restrictions in tables based on object defin…
Browse files Browse the repository at this point in the history
…itions in schema (#921)

* Add function.

* Flatten anyOfs of enums.

* Render valid values in metadata tables.

* Add valid values to column tables.

* Drop coverage of certain types.

Dropped types include anyOfs, pattern strings, format strings, and arrays.

* Fix.

* Actually fix?

* Reincorporate changes.

* Run black.

* Remove "allowed values" from field descriptions.

Those values will automatically be rendered from the associated enums.

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
  • Loading branch information
tsalo and sappelhoff authored Feb 1, 2022
1 parent c0cacdc commit 51e9f1b
Show file tree
Hide file tree
Showing 5 changed files with 110 additions and 43 deletions.
8 changes: 3 additions & 5 deletions src/schema/objects/columns.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ handedness:
hemisphere:
name: hemisphere
description: |
The hemisphere in which the electrode is placed, one of `['L' or 'R']` (MUST be in upper-case).
The hemisphere in which the electrode is placed.
type: string
enum:
- L
Expand Down Expand Up @@ -325,8 +325,6 @@ sample_type:
description: |
Biosample type defined by
[ENCODE Biosample Type](https://www.encodeproject.org/profiles/biosample_type).
One of: `"cell line"`, `"in vitro differentiated cells"`, `"primary cell"`, `"cell-free sample"`,
`"cloning host"`, `"tissue"`, `"whole organisms"`, `"organoid"`, or `"technical sample"`.
type: string
enum:
- cell line
Expand Down Expand Up @@ -410,8 +408,8 @@ status:
name: status
description: |
Data quality observed on the channel.
Must be one of: `good`, `bad`, or `n/a` (when quality is unknown).
A channel is considered `bad` if its data quality is compromised by excessive noise.
A channel is considered `"bad"` if its data quality is compromised by excessive noise.
If quality is unknown, then a value of `"n/a"` may be used.
Description of noise type SHOULD be provided in `[status_description]`.
type: string
enum:
Expand Down
58 changes: 20 additions & 38 deletions src/schema/objects/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ AnatomicalLandmarkCoordinateUnits:
name: AnatomicalLandmarkCoordinateUnits
description: |
Units of the coordinates of `AnatomicalLandmarkCoordinateSystem`.
MUST be `"m"`, `"cm"`, or `"mm"`.
type: string
enum:
- m
Expand Down Expand Up @@ -126,7 +125,7 @@ AnatomicalLandmarkCoordinates__mri:
ArterialSpinLabelingType:
name: ArterialSpinLabelingType
description: |
`"CASL"`, `"PCASL"`, `"PASL"`.
The arterial spin labeling type.
type: string
enum:
- CASL
Expand Down Expand Up @@ -311,8 +310,7 @@ BrainLocation:
CASLType:
name: CASLType
description: |
Describes if a separate coil is used for labeling:
`"single-coil"` or `"double-coil"`.
Describes if a separate coil is used for labeling.
type: string
enum:
- single-coil
Expand Down Expand Up @@ -425,8 +423,6 @@ ContrastBolusIngredient:
name: ContrastBolusIngredient
description: |
Active ingredient of agent.
Values MUST be one of: `"IODINE"`, `"GADOLINIUM"`, `"CARBON DIOXIDE"`,
`"BARIUM"`, `"XENON"`.
Corresponds to DICOM Tag 0018, 1048 `Contrast/Bolus Ingredient`.
type: string
enum:
Expand Down Expand Up @@ -456,7 +452,6 @@ DatasetType:
name: DatasetType
description: |
The interpretation of the dataset.
MUST be one of `"raw"` or `"derivative"`.
For backwards compatibility, the default value is `"raw"`.
type: string
enum:
Expand Down Expand Up @@ -560,7 +555,6 @@ DigitizedHeadPointsCoordinateUnits:
name: DigitizedHeadPointsCoordinateUnits
description: |
Units of the coordinates of `DigitizedHeadPointsCoordinateSystem`.
MUST be `"m"`, `"cm"`, or `"mm"`.
type: string
enum:
- m
Expand Down Expand Up @@ -651,7 +645,6 @@ EEGCoordinateUnits:
name: EEGCoordinateUnits
description: |
Units of the coordinates of `EEGCoordinateSystem`.
MUST be `"m"`, `"cm"`, or `"mm"`.
type: string
enum:
- m
Expand Down Expand Up @@ -839,7 +832,6 @@ FiducialsCoordinateUnits:
description: |
Units in which the coordinates that are listed in the field
`FiducialsCoordinateSystem` are represented.
MUST be `"m"`, `"cm"`, or `"mm"`.
type: string
enum:
- m
Expand Down Expand Up @@ -1083,7 +1075,6 @@ HeadCoilCoordinateUnits:
name: HeadCoilCoordinateUnits
description: |
Units of the coordinates of `HeadCoilCoordinateSystem`.
MUST be `"m"`, `"cm"`, or `"mm"`.
type: string
enum:
- m
Expand Down Expand Up @@ -1463,12 +1454,12 @@ M0Estimate:
M0Type:
name: M0Type
description: |
Describes the presence of M0 information, as either:
`"Separate"` when a separate `*_m0scan.nii[.gz]` is present,
`"Included"` when an m0scan volume is contained within the current
`*_asl.nii[.gz]`,
`"Estimate"` when a single whole-brain M0 value is provided, or
`"Absent"` when no specific M0 information is present.
Describes the presence of M0 information.
`"Separate"` means that a separate `*_m0scan.nii[.gz]` is present.
`"Included"` means that an m0scan volume is contained within the current
`*_asl.nii[.gz]`.
`"Estimate"` means that a single whole-brain M0 value is provided.
`"Absent"` means that no specific M0 information is present.
type: string
enum:
- Separate
Expand Down Expand Up @@ -1505,7 +1496,6 @@ MEGCoordinateUnits:
name: MEGCoordinateUnits
description: |
Units of the coordinates of `MEGCoordinateSystem`.
MUST be `"m"`, `"cm"`, or `"mm"`.
type: string
enum:
- m
Expand All @@ -1523,7 +1513,6 @@ MEGREFChannelCount:
MRAcquisitionType:
name: MRAcquisitionType
description: |
Possible values: `"2D"` or `"3D"`.
Type of sequence readout.
Corresponds to DICOM Tag 0018, 0023 `MR Acquisition Type`.
type: string
Expand Down Expand Up @@ -1564,10 +1553,9 @@ MTPulseShape:
name: MTPulseShape
description: |
Shape of the magnetization transfer RF pulse waveform.
Accepted values: `"HARD"`, `"GAUSSIAN"`,
`"GAUSSHANN"` (gaussian pulse with Hanning window),
`"SINC"`, `"SINCHANN"` (sinc pulse with Hanning window),
`"SINCGAUSS"` (sinc pulse with Gaussian window), `"FERMI"`.
The value `"GAUSSHANN"` refers to a Gaussian pulse with a Hanning window.
The value `"SINCHANN"` refers to a sinc pulse with a Hanning window.
The value `"SINCGAUSS"` refers to a sinc pulse with a Gaussian window.
type: string
enum:
- HARD
Expand Down Expand Up @@ -1807,8 +1795,7 @@ PASLType:
PCASLType:
name: PCASLType
description: |
Type the gradient pulses used in the `"control"` condition:
`"balanced"` or `"unbalanced"`.
The type of gradient pulses used in the `"control"` condition.
type: string
enum:
- balanced
Expand Down Expand Up @@ -1886,7 +1873,6 @@ PharmaceuticalName:
PhaseEncodingDirection:
name: PhaseEncodingDirection
description: |
Possible values: `"i"`, `"j"`, `"k"`, `"i-"`, `"j-"`, `"k-"`.
The letters `i`, `j`, `k` correspond to the first, second and third axis of
the data in the NIFTI file.
The polarity of the phase encoding is assumed to go from zero index to
Expand Down Expand Up @@ -2109,7 +2095,7 @@ RecordingDuration:
RecordingType:
name: RecordingType
description: |
Defines whether the recording is `"continuous"`, `"discontinuous"` or
Defines whether the recording is `"continuous"`, `"discontinuous"`, or
`"epoched"`, where `"epoched"` is limited to time windows about events of
interest (for example, stimulus presentations or subject responses).
type: string
Expand Down Expand Up @@ -2244,8 +2230,6 @@ SampleOrigin:
name: SampleOrigin
description: |
Describes from which tissue the genetic information was extracted.
Values MUST be one of `"blood"`, `"saliva"`, `"brain"`, `"csf"`,
`"breast milk"`, `"bile"`, `"amniotic fluid"`, `"other biospecimen"`.
type: string
enum:
- blood
Expand Down Expand Up @@ -2385,9 +2369,8 @@ SkullStripped:
SliceEncodingDirection:
name: SliceEncodingDirection
description: |
Possible values: `"i"`, `"j"`, `"k"`, `"i-"`, `"j-"`, `"k-"`
(the axis of the NIfTI data along which slices were acquired,
and the direction in which `SliceTiming` is defined with respect to).
The axis of the NIfTI data along which slices were acquired,
and the direction in which `SliceTiming` is defined with respect to.
`i`, `j`, `k` identifiers correspond to the first, second and third axis of
the data in the NIfTI file.
A `-` sign indicates that the contents of `SliceTiming` are defined in
Expand Down Expand Up @@ -2598,7 +2581,6 @@ SpoilingType:
name: SpoilingType
description: |
Specifies which spoiling method(s) are used by a spoiled sequence.
Accepted values: `"RF"`, `"GRADIENT"` or `"COMBINED"`.
type: string
enum:
- RF
Expand Down Expand Up @@ -2685,8 +2667,6 @@ TissueOrigin:
name: TissueOrigin
description: |
Describes the type of tissue analyzed for `SampleOrigin` `brain`.
Values MUST be one of `"gray matter"`, `"white matter"`, `"csf"`,
`"meninges"`, `"macrovascular"` or `microvascular`.
type: string
enum:
- gray matter
Expand Down Expand Up @@ -2767,8 +2747,10 @@ Type:
name: Type
description: |
Short identifier of the mask.
Reserved values: `Brain` - brain mask, `Lesion` - lesion mask,
`Face` - face mask, `ROI` - ROI mask
The value `"Brain"` refers to a brain mask.
The value `"Lesion"` refers to a lesion mask.
The value `"Face"` refers to a face mask.
The value `"ROI"` refers to a region of interest mask.
type: string
enum:
- Brain
Expand Down Expand Up @@ -2963,7 +2945,7 @@ iEEGCoordinateSystemDescription:
iEEGCoordinateUnits:
name: iEEGCoordinateUnits
description: |
Units of the `*_electrodes.tsv`, MUST be `"m"`, `"mm"`, `"cm"` or `"pixels"`.
Units of the `*_electrodes.tsv`.
MUST be `"pixels"` if `iEEGCoordinateSystem` is `Pixels`.
type: string
enum:
Expand Down
12 changes: 12 additions & 0 deletions tools/schemacode/schemacode/render.py
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,12 @@ def make_metadata_table(schema, field_info, tablefmt="github"):
type_string = utils.resolve_metadata_type(metadata_schema[field])

description = metadata_schema[field]["description"] + " " + description_addendum

# Try to add info about valid values
valid_values_str = utils.describe_valid_values(metadata_schema[field])
if valid_values_str:
description += "\n\n\n\n" + valid_values_str

# A backslash before a newline means continue a string
description = description.replace("\\\n", "")
# Two newlines should be respected
Expand Down Expand Up @@ -576,6 +582,12 @@ def make_columns_table(schema, column_info, tablefmt="github"):
type_string = utils.resolve_metadata_type(column_schema[field])

description = column_schema[field]["description"] + " " + description_addendum

# Try to add info about valid values
valid_values_str = utils.describe_valid_values(column_schema[field])
if valid_values_str:
description += "\n\n\n\n" + valid_values_str

# A backslash before a newline means continue a string
description = description.replace("\\\n", "")
# Two newlines should be respected
Expand Down
11 changes: 11 additions & 0 deletions tools/schemacode/schemacode/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,17 @@ def dereference_yaml(schema, struct):

struct = {key: dereference_yaml(schema, val) for key, val in struct.items()}

# For the rare case of multiple sets of valid values (enums) from multiple references,
# anyOf is used. Here we try to flatten our anyOf of enums into a single enum list.
if "anyOf" in struct.keys():
if all("enum" in obj for obj in struct["anyOf"]):
all_enum = [v["enum"] for v in struct["anyOf"]]
all_enum = [item for sublist in all_enum for item in sublist]

struct.pop("anyOf")
struct["type"] = "string"
struct["enum"] = all_enum

elif isinstance(struct, list):
struct = [dereference_yaml(schema, item) for item in struct]

Expand Down
64 changes: 64 additions & 0 deletions tools/schemacode/schemacode/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,3 +197,67 @@ def resolve_metadata_type(definition):
string = "unknown"

return string


def describe_valid_values(definition):
"""Build a sentence describing valid values for an object from its definition.
This only covers booleans, enums, integers, and numbers.
Currently uncovered are anyOfs, arrays, and objects.
Parameters
----------
definition : :obj:`dict`
An object definition, following the BIDS schema object rules.
Returns
-------
:obj:`str`
A sentence describing valid values for the object.
"""
description = ""
if "anyOf" in definition.keys():
return description

if definition["type"] == "boolean":
description = 'Must be one of: `"true"`, `"false"`.'

elif definition["type"] == "string":
if "enum" in definition.keys():
# Allow enums to be "objects" (dicts) or strings
enum_values = [
list(v.keys())[0] if isinstance(v, dict) else v for v in definition["enum"]
]
enum_values = [f'`"{v}"`' for v in enum_values]
description = f"Must be one of: {', '.join(enum_values)}."

elif definition["type"] in ("integer", "number"):
if "minimum" in definition.keys():
minstr = f"greater than or equal to {definition['minimum']}"
elif "exclusiveMinimum" in definition.keys():
minstr = f"greater than {definition['exclusiveMinimum']}"
else:
minstr = ""

if "maximum" in definition.keys():
maxstr = f"less than or equal to {definition['maximum']}"
elif "exclusiveMaximum" in definition.keys():
maxstr = f"less than {definition['exclusiveMaximum']}"
else:
maxstr = ""

if minstr and maxstr:
minmaxstr = f"{minstr} and {maxstr}"
elif minstr:
minmaxstr = minstr
elif maxstr:
minmaxstr = maxstr
else:
minmaxstr = ""

if minmaxstr:
description = f"Must be a number {minmaxstr}."
else:
description = ""

return description

0 comments on commit 51e9f1b

Please sign in to comment.