Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow BIDS-Derivatives convention for outputs #42

Closed
tsalo opened this issue Aug 8, 2020 · 11 comments
Closed

Follow BIDS-Derivatives convention for outputs #42

tsalo opened this issue Aug 8, 2020 · 11 comments

Comments

@tsalo
Copy link
Contributor

tsalo commented Aug 8, 2020

Is your feature request related to a problem? Please describe.
BIDSifying rapidtide outputs will make it easier to integrate rapidtide with fMRIPrep and other BIDS Apps.

Describe the solution you'd like
Generated files could follow BIDS-Derivatives convention, both in terms of the filenames and the file formats. Additionally, it would be great if the documentation included an Outputs of rapidtide page. Tedana's version uses tables, while fMRIPrep's version is more expansive and text-oriented.

Describe alternatives you've considered
The documentation under "Usage" could be expanded instead of adding a new page just for describing workflow outputs. This might be better for rapidtide, given the sheer number of different workflows that are included in the package. In tedana and fMRIPrep's cases, they only support one or two workflows for which the outputs need documentation.

Additional context
This request stems from nipreps/fmriprep#1678 (comment).

@tsalo
Copy link
Contributor Author

tsalo commented Aug 11, 2020

Here are some initial thoughts on the files you outlined in nipreps/fmriprep#1678, based on bids-standard/bids-specification#519:

  1. _lagregressors.nii.gz --> X_desc-rapidtide_timeseries.nii.gz
  2. final regressor timecourse --> X_desc-rapidtide_timeseries.tsv (would conflict with the above)
  3. _lagtimes.nii.gz --> Not sure
  4. _lagstrengths.nii.gz --> Not sure
  5. _lagsigma.nii.gz --> X_desc-rapidtide_std.nii.gz
  6. _lagmask.nii.gz --> X_desc-rapidtide_mask.nii.gz

@bbfrederick
Copy link
Owner

bbfrederick commented Oct 20, 2020

I've made a stab at this. There is a new option "--bidsoutput" that should put pretty much everything into BIDS compatible files (the exception are some text files specifying the command that was run.)

@tsalo
Copy link
Contributor Author

tsalo commented Oct 20, 2020

The code-base is so huge that I'm having some trouble hunting down all of the new outputs. Are they summarized anywhere (e.g., in the documentation, or listed as targets for a test)? If not, I could probably fetch the dev branch and run it on some data locally to figure out the final list.

One thing that I didn't see was associated json files. Those JSONs are optional, but they'd be pretty helpful, I think. Specifically, two fields that would be good to have are RawSources and Units. The RawSources would just be the inputs to the files, I think, and then if those inputs are actually not raw BIDS files (or are not final derivatives), then it would be on the tool (e.g., fMRIPrep) to chain the series of RawSources together to trace back to the actual BIDS files. Some of the files (like the lags) should be in seconds, so having Units there would be helpful. There are probably a few other common fields that are worth including, but I'd need to read through the outputs to figure that out.

@bbfrederick
Copy link
Owner

bbfrederick commented Oct 20, 2020 via email

@bbfrederick
Copy link
Owner

Ok, I've added json sidecars for all the nifti files ending in _map.nii.gz (which are the main output files). I put in RawSources and Units (although they are "None" for anything but lagtimes and lagsigma). I also added a CommandLineArgs, which shows you all the options used to invoke rapidtide.

@tsalo
Copy link
Contributor Author

tsalo commented Oct 22, 2020

Based on that test, here are the files:

dgsr_bids_commandline.txt
dgsr_bids_desc-MTT_map.json
dgsr_bids_desc-MTT_map.nii.gz
dgsr_bids_desc-R2_map.json
dgsr_bids_desc-R2_map.nii.gz
dgsr_bids_desc-autocorr_regressor.json
dgsr_bids_desc-autocorr_regressor.tsv.gz
dgsr_bids_desc-corrdistdata_info.json
dgsr_bids_desc-corrdistdata_info.tsv.gz
dgsr_bids_desc-corrout_info.nii.gz
dgsr_bids_desc-failreason_map.json
dgsr_bids_desc-failreason_map.nii.gz
dgsr_bids_desc-fitNorm_map.json
dgsr_bids_desc-fitNorm_map.nii.gz
dgsr_bids_desc-fitR2_hist.json
dgsr_bids_desc-fitR2_hist.tsv.gz
dgsr_bids_desc-fitcoff_map.json
dgsr_bids_desc-fitcoff_map.nii.gz
dgsr_bids_desc-fitmask_map.json
dgsr_bids_desc-fitmask_map.nii.gz
dgsr_bids_desc-fitwindow_info.nii.gz
dgsr_bids_desc-fmrires_regressor.json
dgsr_bids_desc-fmrires_regressor.tsv.gz
dgsr_bids_desc-gaussout_info.nii.gz
dgsr_bids_desc-globallag_hist.json
dgsr_bids_desc-globallag_hist.tsv.gz
dgsr_bids_desc-globalmean_mask.nii.gz
dgsr_bids_desc-lagsigma_map.json
dgsr_bids_desc-lagsigma_map.nii.gz
dgsr_bids_desc-lagstrengths_map.json
dgsr_bids_desc-lagstrengths_map.nii.gz
dgsr_bids_desc-lagtimes_hist.json
dgsr_bids_desc-lagtimes_hist.tsv.gz
dgsr_bids_desc-lagtimes_map.json
dgsr_bids_desc-lagtimes_map.nii.gz
dgsr_bids_desc-lfofiltered_bold.nii.gz
dgsr_bids_desc-maxcorr_hist.json
dgsr_bids_desc-maxcorr_hist.tsv.gz
dgsr_bids_desc-meanvalue_map.json
dgsr_bids_desc-meanvalue_map.nii.gz
dgsr_bids_desc-nullsimfunc_hist.json
dgsr_bids_desc-nullsimfunc_hist.tsv.gz
dgsr_bids_desc-origres_regressor.json
dgsr_bids_desc-origres_regressor.tsv.gz
dgsr_bids_desc-plt0p001_mask.nii.gz
dgsr_bids_desc-plt0p005_mask.nii.gz
dgsr_bids_desc-plt0p010_mask.nii.gz
dgsr_bids_desc-plt0p050_mask.nii.gz
dgsr_bids_desc-processed_mask.nii.gz
dgsr_bids_desc-r2value_map.json
dgsr_bids_desc-r2value_map.nii.gz
dgsr_bids_desc-refine_mask.nii.gz
dgsr_bids_desc-refined_regressor.json
dgsr_bids_desc-refined_regressor.tsv.gz
dgsr_bids_desc-resampres_regressor.json
dgsr_bids_desc-resampres_regressor.tsv
dgsr_bids_desc-resampres_regressor.tsv.gz
dgsr_bids_desc-rvalue_map.json
dgsr_bids_desc-rvalue_map.nii.gz
dgsr_bids_desc-width_hist.json
dgsr_bids_desc-width_hist.tsv.gz
dgsr_bids_formattedcommandline.txt
dgsr_bids_memusage.csv
dgsr_bids_options.json
dgsr_bids_pass-1_desc-despeckle_mask.nii.gz
dgsr_bids_pass-2_desc-despeckle_mask.nii.gz
dgsr_bids_pass-3_desc-despeckle_mask.nii.gz
dgsr_bids_runtimings.txt

A few thoughts I have are:

  1. _regressor --> _timeseries: This is best outlined in the open pull request for functional derivatives (BEP 012: Functional MRI derivatives bids-standard/bids-specification#519)
  2. .tsv.gz --> .tsv: I believe that gzipped tsv files are standard for highly sampled time series, like physio data, but time series around the resolution of the TR don't generally get gzipped.
  3. I'm not really sure what the difference between desc-rvalue_map and desc-R2_map is.
  4. _desc-meanvalue_map --> _mean: The mean suffix is added in BEP 012: Functional MRI derivatives bids-standard/bids-specification#519.
  5. "Units": "None" --> "Units": "arbitrary": This is a recent addition to the specification, since arbitrary units are not part of the standard unit systems used by the specification.
  6. When SamplingFrequency is the same as the input fMRI data, you can use the special keyword "TR", which is added in that PR I mentioned above.
  7. In the json files linked to tsv files, the columns should be at the top level (i.e., no "Columns" key). The way you have it set up is more like physio data, where the tsv wouldn't contain the column names at all, so they have to be set in the json. In rapidtide's case, those tsv files are more like fMRIPrep's confounds files (even the upsampled ones), so the column names are in the tsv and the jsons should be structured using the more basic tabular file rules.
  8. desc-fitmask_map --> desc-fit_mask: You could include mask-specific metadata (see here), especially "Type" to give more info about what a "fit" mask corresponds to. Maybe something like "Brain voxels with sufficient model fit"? I'm not sure if sentences are recommended, since the reserved values are all single words.
  9. desc-fitcoff_map --> desc-fitcoeff_map: I could be misunderstanding what this is, but I assumed it was parameter estimates from the model?
  10. pass isn't a supported entity, but it seems necessary in this case. I would only recommend adopting a reserved keyword in this case to make it easier to grab the last pass programmatically. Something like pass-final, maybe?
  11. desc-refine_mask and desc-refined_regressor: Should the mask maybe also be desc-refined?
  12. I've messaged @\effigies about the possibility of using the res entity in time series files. If he says it fits with the prescribed usage, then that could clean up the naming of a lot of the resampled versions of time series here. Never mind, it's a no-go on using res, since res is specifically about spatial resolution. I think desc is the way to go for these, at the moment.
  13. I think we could also take some cues from fitlins/BIDS-Models, which are still in flux, but could be useful for model fit-type stuff. Take a look here. Basically, for the fit coefficients and R^2 values, you could use _stat-effect_statmap and _stat-rSquare_statmap, respectively. There's the possibility that that will change, but I think it's a good starting point.
  14. Finally, I unfortunately don't know what several of the maps are, so I don't have any useful feedback on those. Hopefully as I dig into things, I'll gain a better understanding of the outputs.

I hope this helps. I'm not the Derivatives expert among the BIDS maintainers, so I could be wrong about some of these recommendations.

@bbfrederick
Copy link
Owner

bbfrederick commented Oct 22, 2020

A few thoughts I have are:

  1. Done
  2. Done
  3. This is a result of a bug in mapping internal variable names to file names. Fixed.
  4. Done
  5. This is not required, though, right? If one of the tsv files gets separated from the other files, I want to still know what the sample rate is.
  6. I'll have to look at some examples to figure out what to do here.
  7. Name changed. Not sure what I'll do about keywords.
  8. Yes. Just sloppy naming on my part. Fixed.
  9. Not sure what to do here. I think I'd rather combine the masks into one file.
  10. Sloppy naming again. I'll go with the change you suggest, but the two things are related, but different.
  11. Yes, a temporal resolution entity would be good. "sampletime"?
  12. I'll try that.
  13. The maps should be renamed to be more informative - these are historical names that I've used for quite some time, and need revision. My current thinking is:
CurrentName Meaning NewName
lagtimes time of maximum correlation maxtime_map
lagstrengths maximum correlation value (R) maxcorr_map (although now correlation is just one of a few similarity metrics)
lagsigma halfwidth of correlation peak maxwidth_map
R2 maximum correlation squared maxcorrsq_map
MTT estimate of mean transit time MTT_map (this is the accepted term for what I'm trying to calculate)
fitmask mask showing where the correlation fit succeeded corrfit_mask

@tsalo
Copy link
Contributor Author

tsalo commented Oct 23, 2020

This is not required, though, right? If one of the tsv files gets separated from the other files, I want to still know what the sample rate is.

Right, the "TR" keyword is entirely optional.

Yes, a temporal resolution entity would be good. "sampletime"?

I've opened an issue about this in the electrophysiology derivatives BEP repository (bids-standard/bep021#2), but, from what I've heard, they will probably be slow to respond. I think it's probably best to use the desc field to denote the resolution (as you're already doing), with the caveat that there may be a better way in the future.

The maps should be renamed to be more informative - these are historical names that I've used for quite some time, and need revision.

Personally, I think "lag time" is easier to interpret than "maximum correlation time" since it's how users will think of it, but I guess it's a matter of choosing between the interpretation and the actual value.

Ultimately, I think that MTT will be its own suffix (either _MTT, _mtt, or _MTTmap), since, as you said, it's an accepted term and an established output of certain things. I also noticed that the Units for this file are "arbitrary", but isn't this map in seconds?

One other question- is desc-lfofilterResult_bold.nii.gz the voxel-wise, lagged LFO confound regressor or the denoised BOLD data after confound regression? If it's the former, then I think a better name would be something like desc-lfoConfound_timeseries.nii.gz (you can have timeseries.nii.gz).

Other than that, it all looks very good to me. The _hist, _info, and _map suffixes are not supported, but I think you have found good alternatives that work nicely around the current limits of BIDS Derivatives.

@bbfrederick
Copy link
Owner

Personally, I think "lag time" is easier to interpret than "maximum correlation time" since it's how users will think of it, but I guess it's a matter of choosing between the interpretation and the actual value.

You may be right - let me think about that.

Ultimately, I think that MTT will be its own suffix (either _MTT, _mtt, or _MTTmap), since, as you said, it's an accepted term and an established output of certain things. I also noticed that the Units for this file are "arbitrary", but isn't this map in seconds?

Yes it is - fixed.

One other question- is desc-lfofilterResult_bold.nii.gz the voxel-wise, lagged LFO confound regressor or the denoised BOLD data after confound regression? If it's the former, then I think a better name would be something like desc-lfoConfound_timeseries.nii.gz (you can have timeseries.nii.gz).

It's the cleaned dataset. This was kind of confusing - I've changed the names a bit. now "desc-lfofilterCleaned.nii.gz" is the name of the denoised BOLD data after confound regression, and "desc-lfofilterRemoved.nii.gz" is the voxel-wise, lagged LFO confound regressor (this file is only saved if the "--nolimitoutput" flag is selected).

I'm working on a table for the documentation with clear explanations of what file is what. To get decent tables, I need to use markdown (can't figure out how to make nice tables in .rst). The file "candidate_usage.md" has the modifications so far.

@tsalo
Copy link
Contributor Author

tsalo commented Oct 26, 2020

It's the cleaned dataset. This was kind of confusing - I've changed the names a bit. now "desc-lfofilterCleaned.nii.gz" is the name of the denoised BOLD data after confound regression, and "desc-lfofilterRemoved.nii.gz" is the voxel-wise, lagged LFO confound regressor (this file is only saved if the "--nolimitoutput" flag is selected).

Ah, that makes sense. Thank you for the clarification.

I'm working on a table for the documentation with clear explanations of what file is what. To get decent tables, I need to use markdown (can't figure out how to make nice tables in .rst). The file "candidate_usage.md" has the modifications so far.

Do these tables look alright? If so, I can help with the RST tables.

@bbfrederick
Copy link
Owner

bbfrederick commented Oct 26, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants