Follow BIDS-Derivatives convention for outputs #42

tsalo · 2020-08-08T16:51:08Z

Is your feature request related to a problem? Please describe.
BIDSifying rapidtide outputs will make it easier to integrate rapidtide with fMRIPrep and other BIDS Apps.

Describe the solution you'd like
Generated files could follow BIDS-Derivatives convention, both in terms of the filenames and the file formats. Additionally, it would be great if the documentation included an Outputs of rapidtide page. Tedana's version uses tables, while fMRIPrep's version is more expansive and text-oriented.

Describe alternatives you've considered
The documentation under "Usage" could be expanded instead of adding a new page just for describing workflow outputs. This might be better for rapidtide, given the sheer number of different workflows that are included in the package. In tedana and fMRIPrep's cases, they only support one or two workflows for which the outputs need documentation.

Additional context
This request stems from nipreps/fmriprep#1678 (comment).

The text was updated successfully, but these errors were encountered:

tsalo · 2020-08-11T17:26:02Z

Here are some initial thoughts on the files you outlined in nipreps/fmriprep#1678, based on bids-standard/bids-specification#519:

_lagregressors.nii.gz --> X_desc-rapidtide_timeseries.nii.gz
final regressor timecourse --> X_desc-rapidtide_timeseries.tsv (would conflict with the above)
_lagtimes.nii.gz --> Not sure
_lagstrengths.nii.gz --> Not sure
_lagsigma.nii.gz --> X_desc-rapidtide_std.nii.gz
_lagmask.nii.gz --> X_desc-rapidtide_mask.nii.gz

bbfrederick · 2020-10-20T18:37:01Z

I've made a stab at this. There is a new option "--bidsoutput" that should put pretty much everything into BIDS compatible files (the exception are some text files specifying the command that was run.)

tsalo · 2020-10-20T21:28:30Z

The code-base is so huge that I'm having some trouble hunting down all of the new outputs. Are they summarized anywhere (e.g., in the documentation, or listed as targets for a test)? If not, I could probably fetch the dev branch and run it on some data locally to figure out the final list.

One thing that I didn't see was associated json files. Those JSONs are optional, but they'd be pretty helpful, I think. Specifically, two fields that would be good to have are RawSources and Units. The RawSources would just be the inputs to the files, I think, and then if those inputs are actually not raw BIDS files (or are not final derivatives), then it would be on the tool (e.g., fMRIPrep) to chain the series of RawSources together to trace back to the actual BIDS files. Some of the files (like the lags) should be in seconds, so having Units there would be helpful. There are probably a few other common fields that are worth including, but I'd need to read through the outputs to figure that out.

bbfrederick · 2020-10-20T22:45:28Z

Sorry - I haven’t updated the documentation at all yet. Quickest way to check would be to run the second entry in the “testfmri” file in rapidtide/data/examples/src to generate bids output files. It should only take about 3 minutes to run. You are correct - I didn’t make any json sidecars for the niftis (all the tsv files have them) and probably should. I like your suggestion on raw sources and units. Blaise Any typos, strange phrasing, or whimsy are because autocorrect hates me.

…

On Oct 20, 2020, at 5:28 PM, Taylor Salo ***@***.***> wrote: The code-base is so huge that I'm having some trouble hunting down all of the new outputs. Are they summarized anywhere (e.g., in the documentation, or listed as targets for a test)? If not, I could probably fetch the dev branch and run it on some data locally to figure out the final list. One thing that I didn't see was associated json files. Those JSONs are optional, but they'd be pretty helpful, I think. Specifically, two fields that would be good to have are RawSources and Units. The RawSources would just be the inputs to the files, I think, and then if those inputs are actually not raw BIDS files (or are not final derivatives), then it would be on the tool (e.g., fMRIPrep) to chain the series of RawSources together to trace back to the actual BIDS files. Some of the files (like the lags) should be in seconds, so having Units there would be helpful. There are probably a few other common fields that are worth including, but I'd need to read through the outputs to figure that out. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bbfrederick · 2020-10-21T00:02:15Z

Ok, I've added json sidecars for all the nifti files ending in _map.nii.gz (which are the main output files). I put in RawSources and Units (although they are "None" for anything but lagtimes and lagsigma). I also added a CommandLineArgs, which shows you all the options used to invoke rapidtide.

tsalo · 2020-10-22T16:38:00Z

Based on that test, here are the files:

dgsr_bids_commandline.txt
dgsr_bids_desc-MTT_map.json
dgsr_bids_desc-MTT_map.nii.gz
dgsr_bids_desc-R2_map.json
dgsr_bids_desc-R2_map.nii.gz
dgsr_bids_desc-autocorr_regressor.json
dgsr_bids_desc-autocorr_regressor.tsv.gz
dgsr_bids_desc-corrdistdata_info.json
dgsr_bids_desc-corrdistdata_info.tsv.gz
dgsr_bids_desc-corrout_info.nii.gz
dgsr_bids_desc-failreason_map.json
dgsr_bids_desc-failreason_map.nii.gz
dgsr_bids_desc-fitNorm_map.json
dgsr_bids_desc-fitNorm_map.nii.gz
dgsr_bids_desc-fitR2_hist.json
dgsr_bids_desc-fitR2_hist.tsv.gz
dgsr_bids_desc-fitcoff_map.json
dgsr_bids_desc-fitcoff_map.nii.gz
dgsr_bids_desc-fitmask_map.json
dgsr_bids_desc-fitmask_map.nii.gz
dgsr_bids_desc-fitwindow_info.nii.gz
dgsr_bids_desc-fmrires_regressor.json
dgsr_bids_desc-fmrires_regressor.tsv.gz
dgsr_bids_desc-gaussout_info.nii.gz
dgsr_bids_desc-globallag_hist.json
dgsr_bids_desc-globallag_hist.tsv.gz
dgsr_bids_desc-globalmean_mask.nii.gz
dgsr_bids_desc-lagsigma_map.json
dgsr_bids_desc-lagsigma_map.nii.gz
dgsr_bids_desc-lagstrengths_map.json
dgsr_bids_desc-lagstrengths_map.nii.gz
dgsr_bids_desc-lagtimes_hist.json
dgsr_bids_desc-lagtimes_hist.tsv.gz
dgsr_bids_desc-lagtimes_map.json
dgsr_bids_desc-lagtimes_map.nii.gz
dgsr_bids_desc-lfofiltered_bold.nii.gz
dgsr_bids_desc-maxcorr_hist.json
dgsr_bids_desc-maxcorr_hist.tsv.gz
dgsr_bids_desc-meanvalue_map.json
dgsr_bids_desc-meanvalue_map.nii.gz
dgsr_bids_desc-nullsimfunc_hist.json
dgsr_bids_desc-nullsimfunc_hist.tsv.gz
dgsr_bids_desc-origres_regressor.json
dgsr_bids_desc-origres_regressor.tsv.gz
dgsr_bids_desc-plt0p001_mask.nii.gz
dgsr_bids_desc-plt0p005_mask.nii.gz
dgsr_bids_desc-plt0p010_mask.nii.gz
dgsr_bids_desc-plt0p050_mask.nii.gz
dgsr_bids_desc-processed_mask.nii.gz
dgsr_bids_desc-r2value_map.json
dgsr_bids_desc-r2value_map.nii.gz
dgsr_bids_desc-refine_mask.nii.gz
dgsr_bids_desc-refined_regressor.json
dgsr_bids_desc-refined_regressor.tsv.gz
dgsr_bids_desc-resampres_regressor.json
dgsr_bids_desc-resampres_regressor.tsv
dgsr_bids_desc-resampres_regressor.tsv.gz
dgsr_bids_desc-rvalue_map.json
dgsr_bids_desc-rvalue_map.nii.gz
dgsr_bids_desc-width_hist.json
dgsr_bids_desc-width_hist.tsv.gz
dgsr_bids_formattedcommandline.txt
dgsr_bids_memusage.csv
dgsr_bids_options.json
dgsr_bids_pass-1_desc-despeckle_mask.nii.gz
dgsr_bids_pass-2_desc-despeckle_mask.nii.gz
dgsr_bids_pass-3_desc-despeckle_mask.nii.gz
dgsr_bids_runtimings.txt

A few thoughts I have are:

_regressor --> _timeseries: This is best outlined in the open pull request for functional derivatives (BEP 012: Functional MRI derivatives bids-standard/bids-specification#519)
.tsv.gz --> .tsv: I believe that gzipped tsv files are standard for highly sampled time series, like physio data, but time series around the resolution of the TR don't generally get gzipped.
I'm not really sure what the difference between desc-rvalue_map and desc-R2_map is.
_desc-meanvalue_map --> _mean: The mean suffix is added in BEP 012: Functional MRI derivatives bids-standard/bids-specification#519.
"Units": "None" --> "Units": "arbitrary": This is a recent addition to the specification, since arbitrary units are not part of the standard unit systems used by the specification.
When SamplingFrequency is the same as the input fMRI data, you can use the special keyword "TR", which is added in that PR I mentioned above.
In the json files linked to tsv files, the columns should be at the top level (i.e., no "Columns" key). The way you have it set up is more like physio data, where the tsv wouldn't contain the column names at all, so they have to be set in the json. In rapidtide's case, those tsv files are more like fMRIPrep's confounds files (even the upsampled ones), so the column names are in the tsv and the jsons should be structured using the more basic tabular file rules.
desc-fitmask_map --> desc-fit_mask: You could include mask-specific metadata (see here), especially "Type" to give more info about what a "fit" mask corresponds to. Maybe something like "Brain voxels with sufficient model fit"? I'm not sure if sentences are recommended, since the reserved values are all single words.
desc-fitcoff_map --> desc-fitcoeff_map: I could be misunderstanding what this is, but I assumed it was parameter estimates from the model?
pass isn't a supported entity, but it seems necessary in this case. I would only recommend adopting a reserved keyword in this case to make it easier to grab the last pass programmatically. Something like pass-final, maybe?
desc-refine_mask and desc-refined_regressor: Should the mask maybe also be desc-refined?
I've messaged @\effigies about the possibility of using the res entity in time series files. If he says it fits with the prescribed usage, then that could clean up the naming of a lot of the resampled versions of time series here. Never mind, it's a no-go on using res, since res is specifically about spatial resolution. I think desc is the way to go for these, at the moment.
I think we could also take some cues from fitlins/BIDS-Models, which are still in flux, but could be useful for model fit-type stuff. Take a look here. Basically, for the fit coefficients and R^2 values, you could use _stat-effect_statmap and _stat-rSquare_statmap, respectively. There's the possibility that that will change, but I think it's a good starting point.
Finally, I unfortunately don't know what several of the maps are, so I don't have any useful feedback on those. Hopefully as I dig into things, I'll gain a better understanding of the outputs.

I hope this helps. I'm not the Derivatives expert among the BIDS maintainers, so I could be wrong about some of these recommendations.

bbfrederick · 2020-10-22T19:53:55Z

A few thoughts I have are:

Done
Done
This is a result of a bug in mapping internal variable names to file names. Fixed.
Done
This is not required, though, right? If one of the tsv files gets separated from the other files, I want to still know what the sample rate is.
I'll have to look at some examples to figure out what to do here.
Name changed. Not sure what I'll do about keywords.
Yes. Just sloppy naming on my part. Fixed.
Not sure what to do here. I think I'd rather combine the masks into one file.
Sloppy naming again. I'll go with the change you suggest, but the two things are related, but different.
Yes, a temporal resolution entity would be good. "sampletime"?
I'll try that.
The maps should be renamed to be more informative - these are historical names that I've used for quite some time, and need revision. My current thinking is:

CurrentName	Meaning	NewName
lagtimes	time of maximum correlation	maxtime_map
lagstrengths	maximum correlation value (R)	maxcorr_map (although now correlation is just one of a few similarity metrics)
lagsigma	halfwidth of correlation peak	maxwidth_map
R2	maximum correlation squared	maxcorrsq_map
MTT	estimate of mean transit time	MTT_map (this is the accepted term for what I'm trying to calculate)
fitmask	mask showing where the correlation fit succeeded	corrfit_mask

tsalo · 2020-10-23T17:02:52Z

This is not required, though, right? If one of the tsv files gets separated from the other files, I want to still know what the sample rate is.

Right, the "TR" keyword is entirely optional.

Yes, a temporal resolution entity would be good. "sampletime"?

I've opened an issue about this in the electrophysiology derivatives BEP repository (bids-standard/bep021#2), but, from what I've heard, they will probably be slow to respond. I think it's probably best to use the desc field to denote the resolution (as you're already doing), with the caveat that there may be a better way in the future.

The maps should be renamed to be more informative - these are historical names that I've used for quite some time, and need revision.

Personally, I think "lag time" is easier to interpret than "maximum correlation time" since it's how users will think of it, but I guess it's a matter of choosing between the interpretation and the actual value.

Ultimately, I think that MTT will be its own suffix (either _MTT, _mtt, or _MTTmap), since, as you said, it's an accepted term and an established output of certain things. I also noticed that the Units for this file are "arbitrary", but isn't this map in seconds?

One other question- is desc-lfofilterResult_bold.nii.gz the voxel-wise, lagged LFO confound regressor or the denoised BOLD data after confound regression? If it's the former, then I think a better name would be something like desc-lfoConfound_timeseries.nii.gz (you can have timeseries.nii.gz).

Other than that, it all looks very good to me. The _hist, _info, and _map suffixes are not supported, but I think you have found good alternatives that work nicely around the current limits of BIDS Derivatives.

bbfrederick · 2020-10-26T12:38:15Z

Personally, I think "lag time" is easier to interpret than "maximum correlation time" since it's how users will think of it, but I guess it's a matter of choosing between the interpretation and the actual value.

You may be right - let me think about that.

Ultimately, I think that MTT will be its own suffix (either _MTT, _mtt, or _MTTmap), since, as you said, it's an accepted term and an established output of certain things. I also noticed that the Units for this file are "arbitrary", but isn't this map in seconds?

Yes it is - fixed.

One other question- is desc-lfofilterResult_bold.nii.gz the voxel-wise, lagged LFO confound regressor or the denoised BOLD data after confound regression? If it's the former, then I think a better name would be something like desc-lfoConfound_timeseries.nii.gz (you can have timeseries.nii.gz).

It's the cleaned dataset. This was kind of confusing - I've changed the names a bit. now "desc-lfofilterCleaned.nii.gz" is the name of the denoised BOLD data after confound regression, and "desc-lfofilterRemoved.nii.gz" is the voxel-wise, lagged LFO confound regressor (this file is only saved if the "--nolimitoutput" flag is selected).

I'm working on a table for the documentation with clear explanations of what file is what. To get decent tables, I need to use markdown (can't figure out how to make nice tables in .rst). The file "candidate_usage.md" has the modifications so far.

tsalo · 2020-10-26T15:28:03Z

It's the cleaned dataset. This was kind of confusing - I've changed the names a bit. now "desc-lfofilterCleaned.nii.gz" is the name of the denoised BOLD data after confound regression, and "desc-lfofilterRemoved.nii.gz" is the voxel-wise, lagged LFO confound regressor (this file is only saved if the "--nolimitoutput" flag is selected).

Ah, that makes sense. Thank you for the clarification.

I'm working on a table for the documentation with clear explanations of what file is what. To get decent tables, I need to use markdown (can't figure out how to make nice tables in .rst). The file "candidate_usage.md" has the modifications so far.

Do these tables look alright? If so, I can help with the RST tables.

bbfrederick · 2020-10-26T15:35:45Z

Those look great. I’m happy to stay with .rst if I can figure out table formatting. The only real change I made to candidate_usage.md after conversion was to mess with the BIDS output tables. Blaise

…

On Oct 26, 2020, at 11:28 AM, Taylor Salo ***@***.***> wrote: It's the cleaned dataset. This was kind of confusing - I've changed the names a bit. now "desc-lfofilterCleaned.nii.gz" is the name of the denoised BOLD data after confound regression, and "desc-lfofilterRemoved.nii.gz" is the voxel-wise, lagged LFO confound regressor (this file is only saved if the "--nolimitoutput" flag is selected). Ah, that makes sense. Thank you for the clarification. I'm working on a table for the documentation with clear explanations of what file is what. To get decent tables, I need to use markdown (can't figure out how to make nice tables in .rst). The file "candidate_usage.md" has the modifications so far. Do these tables <https://tedana.readthedocs.io/en/latest/outputs.html#tedana-derivatives> look alright? If so, I can help with the RST tables. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#42 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAH5DU3J5DSEKMINQCPLUA3SMWIRJANCNFSM4PYXTVLQ>.

bbfrederick closed this as completed Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow BIDS-Derivatives convention for outputs #42

Follow BIDS-Derivatives convention for outputs #42

tsalo commented Aug 8, 2020

tsalo commented Aug 11, 2020

bbfrederick commented Oct 20, 2020 •

edited

Loading

tsalo commented Oct 20, 2020

bbfrederick commented Oct 20, 2020 via email

bbfrederick commented Oct 21, 2020

tsalo commented Oct 22, 2020 •

edited

Loading

bbfrederick commented Oct 22, 2020 •

edited

Loading

tsalo commented Oct 23, 2020

bbfrederick commented Oct 26, 2020

tsalo commented Oct 26, 2020

bbfrederick commented Oct 26, 2020 via email

Follow BIDS-Derivatives convention for outputs #42

Follow BIDS-Derivatives convention for outputs #42

Comments

tsalo commented Aug 8, 2020

tsalo commented Aug 11, 2020

bbfrederick commented Oct 20, 2020 • edited Loading

tsalo commented Oct 20, 2020

bbfrederick commented Oct 20, 2020 via email

bbfrederick commented Oct 21, 2020

tsalo commented Oct 22, 2020 • edited Loading

bbfrederick commented Oct 22, 2020 • edited Loading

tsalo commented Oct 23, 2020

bbfrederick commented Oct 26, 2020

tsalo commented Oct 26, 2020

bbfrederick commented Oct 26, 2020 via email

bbfrederick commented Oct 20, 2020 •

edited

Loading

tsalo commented Oct 22, 2020 •

edited

Loading

bbfrederick commented Oct 22, 2020 •

edited

Loading