Add multi-run workflow synchronized with BIDS dataset #219

tsalo · 2020-05-08T18:55:10Z

Closes #217. This enhancement adds several functions to do the following:

Load relevant (currently just BOLD) scans from BIDS dataset and determine acquisition time, duration, and generalized name for each.
Extract trigger onsets from BlueprintInput object.
Attempt to match physio trigger onsets to scan onsets.
Interpolate any missing physio trigger periods from available scans.
Identify onsets and offsets for physio trigger periods for each frequency in the physio object.
Slice the physio object into acquisition-specific BlueprintOutputs.
Save BlueprintOutput objects to BIDS files.
Generate a figure showing timing of scans and physio trigger periods before and after synchronization, for QC purposes.

Current Development/Testing Strategy

Requires a BIDS dataset and an associated physio file.
Do the following:

from phys2bids.conversion import workflow
from phys2bids.interfaces.acq import populate_physio_obj

bids_dir = '/path/to/bids/'
subject = '01'
session = '01'
physio_file = '/path/to/physio/file.acq'
physio = populate_physio_obj(physio_file, chtrig=3)
res = workflow(physio, bids_dir, sub=subject, ses=session)

Review output filenames and BlueprintOutput objects.

To Do

Proposed Changes

Add a completely (hopefully) unsupervised method for splitting multi-run physio files using onset-time and duration information from a converted BIDS dataset.

pvelasco · 2020-05-11T19:09:04Z

In dicoms, SeriesTime, InstanceCreationTime, and ContentTime are very similar, but AcquisitionTime is generally ~20 seconds earlier than the rest. We need to determine which time is closest to the scanner's trigger pulse.

From some documentation I found from Siemens on image reconstruction:

AcquisitionTime: The Time the acquisition of data started.

From an answer given by Chris Roden (dcm2niix):

Instance Creation Time 0008,0013 initially reflects the time an image leaves a parallel processing reconstructor, so it has a variable delay with regards to the acquisition time. Acquisition Time is more meaningful for data.

So, if the reconstruction involves something more than a straight FFT (i.e., GRAPPA, SENSE, etc.), that will explain the ~20 sec delay you find between the AcquisitionTime and the other times.

If I remember correctly, for Siemens scanners, the "scanner trigger" is sent at the time of the excitation pulse of the first slice of each (collected) volume. So the very first trigger will be sent just a few milliseconds before the first chunk of data for the first slice for the first volume is acquired (AcquisitionTime).

Regarding the time resolution (seconds vs. sub-second): I think the purpose of func_times.tsv and physio_times.tsv is to know which physio file (or chunk of a session-long one) corresponds to which functional file, isn't it?. For that, second-level resolution should be enough. Then, the fine alignment (the one requiring sub-second resolution) should be handled by looking at the first trigger signal in each physio file. That should be your "Time zero", according to BIDs.

tsalo · 2020-05-11T21:11:27Z

Thanks @pvelasco! I'm glad to know that AcquisitionTime is the right one, although it looks like heudiconv grabs acq_time from ContentTime instead of AcquisitionTime. It might be best to use AcquisitionTime instead, although I can follow up about that in nipy/heudiconv#447.

Regarding the time resolution (seconds vs. sub-second): I think the purpose of func_times.tsv and physio_times.tsv is to know which physio file (or chunk of a session-long one) corresponds to which functional file, isn't it?. For that, second-level resolution should be enough. Then, the fine alignment (the one requiring sub-second resolution) should be handled by looking at the first trigger signal in each physio file. That should be your "Time zero", according to BIDs.

Yes, the scan times are used to find the trigger periods that correspond to each functional file, but this approach won't work for single-scan physio files, because it relies on the exact times between physio trigger periods as well as those between scan onsets.

At the moment, my code finds the trigger signal that would have the shortest delay from the scan acquisition time, rather than the first trigger signal. My logic is that, since the triggers occur at variable delays, the one with the shortest delay is the one that is closest to the actual start of acquisition, and is thus the one we should build our timing information from. To find that time, we need scan times that are as accurate as possible. Well, actually we need relative onsets to be accurate. The absolute scan times are not used, because I didn't want to assume that the physio and scan computers' clocks were perfectly synchronized, and because the BioPac segment timestamps seem to only have second-level resolution.

Will need to drop before general use, but there is a plan in place. Just not sure how to implement it yet.

tsalo · 2020-05-12T15:30:47Z

The functions are now in a state that I can call them sequentially in a notebook (see here). I have also made some time series figures of the physio and functional timing from the results (see here for the notebook and below for the figures).

You might notice a few odd things below:

Some of the blue (physio) blocks extend past the red (scan) blocks. This is because my task failed to reset the trigger when it ended, so the trigger is only reset right before the next task starts, by that next task.
Some blue blocks that don't have corresponding red blocks. That's where scans were dropped. In this case, the first task was stopped and restarted, as was the second one.
It's hard to see here but the last block has a slight mismatch between the red and blue blocks. Our last task actually sends the trigger about 6 seconds late, which is something I was not aware of until I started working on this code!

pvelasco · 2020-05-18T15:36:14Z

phys2bids/conversion.py

+    from operator import itemgetter
+    from itertools import groupby
+    physio_data = bioread.read_file(physio_file)
+    trigger_channel = physio_data.channels[-1]


Suggestion: would you consider making your code more generic by allowing more flexibility in the selection of the trigger_channel?
Something like:

trigger_idx = [c.name.lower() for c in physio_data.channels].index('trigger') trigger_channel = physio_data.channels[trigger_idx]

or:

trigger_channel = [c for c in physio_data.channels if c.name.lower()=='trigger'][0]

(or maybe just 'trig' in c.name.lower(), in case people use abbreviations...)

Then, you could use the last channel as a fallback option.

One of the things on my to-do list is to use the native phys2bids classes. I don't know much about them at the moment, so that may take some time, but I believe that selecting the trigger channel is one of the possible parameters for the main physio container object.

That said, if your interest is more in making it easier to test/debug for the near future, I'm happy to make the trigger channel selection more flexible.

Update bids-multi-run.

RayStick · 2020-07-20T21:00:28Z

Hi all, I have read through what people have written but I have done so rather quickly (so as to catch up), so sorry if I misunderstand things.

  In general, I agree with @smoia - I think I will need to see this code in action to understand it properly. I also lean towards have an “old triggers” column and a new “corrected” triggers column, as it probably helps for troubleshooting.   In general, this PR looks like its adding some great functionality, but it may only be relevant for certain experimental and equipment set-ups? I will explain a bit about my experimental set-up, below, and you can let me know if you agree!

Rationale/Concept - I don’t think I can remember an occasion where I have had just a few missing triggers in my physiological recordings, however I have had occasions where I have not recorded any trigger information at all, alongside the physiological data (due to forgetting to plug the trigger connection in ... ) which is very annoying. From what I have picked up, I think I have quite a different set-up to you @tsalo. I don’t have any synchronization between task (stimuli presentation) and trigger recording alongside the physiological data. I just record all the volume triggers that the scanner sends during fMRI sequences, but this does not trigger anything else to happen. (Sometimes I get the the stimuli presentation to be triggered by the first volume trigger of a scan, but that is separate from the physiological recording equipment).

Implementation - I am struggling to work out how this would exactly apply for the type of data I collect, so I will explain what we do. We first turn on our physiological recording software a while before we start any scanning, as it takes time to attach all the peripheral devices to a person, and check the signals look reasonable. We then record continuously throughout the whole MRI session i.e. all physiological data from one MRI session is contained within one file. Within this recording of physiological data, we record all MRI volume triggers. Therefore the start time of the physiological recording file is not synched at all with any other start time (task, MRI scan). However, I understand that this code is going to look at the relative timings between MRI scans to work out the relative physiological timing blocks? That makes some sense to me, but does it therefore always assume you have the first trigger of an fMRI scan, to get the onset of this block correct? And then if there are any triggers missing in the middle it can fill them in just fine? Otherwise, I am struggling to work out how this works without having a trigger recorded at the onset of an fMRI scan.

tsalo · 2020-07-20T21:57:39Z

In general, this PR looks like its adding some great functionality, but it may only be relevant for certain experimental and equipment set-ups?

The ability of the workflow to correct trigger time series is only relevant for some setups, and the workflow will most definitely not work with single-run setups, but the general ability of the workflow to identify and label physio runs associated with specific imaging runs is broadly useful. Basically, you shouldn't have to feed in any information beyond a path to the BIDS dataset and the subject/session to get your physio data.

I am struggling to work out how this would exactly apply for the type of data I collect

Your trigger time series is going to be accurate (unlike mine), but it's also completely unlabeled. You still need to use a heuristic or something to identify which block of physio data corresponds to which imaging file in the BIDS dataset. This should automate that.

Therefore the start time of the physiological recording file is not synched at all with any other start time (task, MRI scan). However, I understand that this code is going to look at the relative timings between MRI scans to work out the relative physiological timing blocks? That makes some sense to me, but does it therefore always assume you have the first trigger of an fMRI scan, to get the onset of this block correct? And then if there are any triggers missing in the middle it can fill them in just fine? Otherwise, I am struggling to work out how this works without having a trigger recorded at the onset of an fMRI scan.

You need starts of physio trigger periods to generally match up with starts of fMRI scans for this to work, but a lot of the extra stuff I put into the workflow is to make sure it will work in cases where the first scan trigger or imaging data are missing.

Think of it like this. You have two arrays of "onsets" with the same scale (seconds) but different start points. You know the durations and "names" for the different onsets for one of the arrays, but not the other. Here's some pseudocode to illustrate:

physio_onsets = [1.5, 7, 12.2, 22, 27]  # from physio file
scan_onsets = [0.8, 6.3, 10.7, 20.5, 25.5]  # from BIDS dataset
scan_durations = [1, 2, 3, 4, 5]  # from BIDS dataset
scan_names = ['a', 'b', 'c', 'd', 'e']  # from BIDS dataset
# the times between the onsets at the same
delta_physio = np.diff(physio_onsets)  # [5.5., 5.2, 9.8, 5]
delta_scans = np.diff(scan_onsets)  # [5.5., 5.2, 9.8, 5]
delta_physio == delta_scans  # True
# So now we find what the scan onsets (and durations) would be
# in terms of the physio signal.

# I'll estimate the onset from just the first scan and physio block
# because we *know* neither is missing in this case
offset = scan_onsets[0] - physio_onsets[0]  # -0.7
new_scan_onsets = scan_onsets + offset
# now we loop through, take each new scan (onset : onset + duration)
# and name the file based on the scan name.

There were three things I noticed in my own data that complicated matters:

Physio trigger onsets happens after a delay, at variable amounts. The delay is a natural consequence of using the task to send the signal, and is generally quite small (except for a bug in one of my tasks). The variability is also quite small and also a consequence of using an intermediary.
- This does not impact your data, based on your setup.
- It's nice to be able to "correct" the trigger signal in these cases, since the task-relayed trigger signal is inaccurate.
Some physio trigger periods were missing, and thus the scans had no corresponding blocks to compare against. This happened sometimes when my tasks failed to send the signal to BioPac.
- This does not impact your data, based on your setup.
- It's nice to be able to "correct" the trigger signal in these cases as well, since the task-relayed trigger signal is, well, completely missing.
Some scans were "missing". Sometimes, you stop a scan early or don't convert it over to the BIDS dataset because it's just that terrible. Whatever the reason, the BIDS dataset won't always reflect all of the physio trigger periods.
- This impacts everyone's data, and is the main reason why we don't want to assume that the first physio trigger matches the first scan in the dataset.

To deal with these three things, my method is somewhat more complicated than the pseudocode above. Basically, I take the onset delta arrays for the physio and scan data and find the "offset" that minimizes the overall difference between the two arrays. In my data, that overall difference will be > 0. In your data (limited to cases where all scans were retained in the BIDS dataset), that difference should be 0.

Of course, the fewer scans there are and the more missing runs there are (both in physio and BIDS), the more likely the method is to be wrong, so that's why it outputs a figure as well.

RayStick · 2020-07-21T16:14:47Z

Thanks for your detailed reply.
Before I reply properly, I have a quick question - how do you define the physio_onsets from the physio file?

tsalo · 2020-07-21T16:28:27Z

I extract contiguous blocks of non-zero values in the trigger signal and identify the time of the first point in each block.

This step can be divorced from the main workflow code, though, in order to support your type of trigger signal. Per #219 (comment), one good route might be to binarize and "blockify" the trigger signal before the workflow, and retain that simplified trigger in a separate channel. Then the workflow could use the block-form trigger without touching the original trigger.

RayStick · 2020-07-22T21:40:20Z

I extract contiguous blocks of non-zero values in the trigger signal and identify the time of the first point in each block.

This step can be divorced from the main workflow code, though, in order to support your type of trigger signal. Per #219 (comment), one good route might be to binarize and "blockify" the trigger signal before the workflow, and retain that simplified trigger in a separate channel. Then the workflow could use the block-form trigger without touching the original trigger.

From re-reading your last comment, and some of the in-line comments, I think I follow the context and scope of this PR now. Thanks!

RayStick · 2020-10-08T14:28:50Z

Shall I wait for conflicts to be resolved before properly reviewing?

tsalo · 2020-10-08T15:35:44Z

@RayStick I've resolved the conflicts so it's good to go. Thanks!

smoia · 2020-10-08T20:48:21Z

@tsalo I'm sorry but I really can't see if there was something that I missed and I should have done for this PR. Is it "only" missing new reviews? If so, Maybe someone else could take the lead on this.

What is the final solution on the trigger channel? Are you rewriting it or adding another "corrected" or "binarised" trigger channel?

tsalo · 2020-10-08T20:51:44Z

@smoia See this thread: https://github.com/physiopy/phys2bids/pull/219/files#r455898174. Basically, the method will only work on trigger signals that are already block-ish. I didn't implement a solution for that in this PR, but I did open #278.

RayStick · 2020-10-09T22:37:40Z

I will take a look on Monday, though bear in mind I have not looked at the phys2bids code for a while now ... so I might be a bit slow to review this! If it needs to be merged sooner, I'd suggest someone else jump in as second reviewer, sorry!

smoia · 2020-10-10T07:47:23Z

Hey @RayStick ! Thank you, but I would wait a tad longer. This PR is not passing any test, and it depends on #278 .
In fact, I'd revert it to draft @tsalo .
There are other reviewes needed if you're eager to do one ;)

Docstring rules are way too aggressive. Also, E203 should be ignored. It's wrong.

codecov · 2020-10-10T16:50:07Z

Codecov Report

Merging #219 (bc7698a) into master (f587559) will decrease coverage by 18.34%.
The diff coverage is 37.57%.

@@             Coverage Diff             @@
##           master     #219       +/-   ##
===========================================
- Coverage   94.61%   76.26%   -18.35%     
===========================================
  Files           8       10        +2     
  Lines         836     1083      +247     
===========================================
+ Hits          791      826       +35     
- Misses         45      257      +212

Impacted Files	Coverage Δ
phys2bids/slice4phys.py	`42.04% <5.66%> (-55.10%)`	⬇️
phys2bids/conversion.py	`13.38% <13.38%> (ø)`
phys2bids/bids.py	`72.02% <57.60%> (-26.04%)`	⬇️
phys2bids/physio_obj.py	`91.75% <92.59%> (-1.14%)`	⬇️
phys2bids/phys2bids.py	`89.28% <100.00%> (+1.21%)`	⬆️
phys2bids/utils.py	`97.50% <100.00%> (-0.94%)`	⬇️
phys2bids/io.py
... and 3 more

tsalo added 6 commits May 8, 2020 13:45

Start working on BIDS dataset-based multi-run converter.

65f314b

Add some code. Not good code, but code.

69fecdb

Fix style issues.

7c169c5

Get synchronize_onsets basically working.

1b198c4

Add duration determination.

c0367c4

Minor refactor.

bd736e4

tsalo mentioned this pull request May 10, 2020

Retain sub-second resolution in acq_time column of scans tsv files nipy/heudiconv#447

Closed

tsalo mentioned this pull request May 11, 2020

Use AcquisitionTime for acq_time column of scans tsv files nipy/heudiconv#450

Closed

tsalo added 3 commits May 11, 2020 20:34

More documentation.

223adb1

Some more fixes.

7b262ba

Fix bugs and include echo keyword.

7191a3e

Will need to drop before general use, but there is a plan in place. Just not sure how to implement it yet.

tsalo mentioned this pull request May 13, 2020

Allow microseconds in scans file datetimes bids-standard/bids-specification#469

Closed

pvelasco reviewed May 18, 2020

View reviewed changes

More flexible trigger channel selection for debugging.

d15ad97

smoia mentioned this pull request Jun 10, 2020

Check that phys2bids is BIDS 1.3.0 compliant. #185

Closed

tsalo mentioned this pull request Jun 16, 2020

Improving split of multiple runs #232

Closed

smoia added the Minormod This PR generally closes an `Enhancement` issue. It increments the minor version (0.+1.0) label Jun 16, 2020

smoia mentioned this pull request Jun 16, 2020

Add the possibility to split multi-run physiological recordings #206

Merged

17 tasks

smoia self-assigned this Jun 18, 2020

tsalo added 2 commits June 18, 2020 11:59

More progress on multi-run conversion.

eb46a71

Merge remote-tracking branch 'physiopy/master' into enh/bids-multi-run

a5aad49

sangfrois mentioned this pull request Jun 24, 2020

User Testing of 2.1.0 #267

Closed

RayStick mentioned this pull request Jun 25, 2020

Deal with unwanted TRs or TRs with uneven spacing #268

Open

tsalo added 4 commits July 3, 2020 13:44

Merge pull request #1 from physiopy/master

8591332

Update bids-multi-run.

Workflow is working now.

9a5ce3b

Fix style issues.

26f8aae

Reorganize and improve documentation.

18f2831

tsalo added 4 commits July 19, 2020 16:46

Update.

17dfa07

Merge branch 'master' into enh/bids-multi-run

40d792f

Get trigger update and resampling working.

cd1dea5

Update requirements.

233bb11

This was referenced Jul 26, 2020

Calculate and retain "blockified" version of trigger in dedicated property #278

Open

Handling of noncontiguous segments in BioPac recordings #279

Open

smoia mentioned this pull request Sep 21, 2020

Auto-detect the trigger channel based on retrieved channel names #306

Merged

4 tasks

Merge branch 'master' into enh/bids-multi-run

8eaa407

tsalo added 4 commits October 10, 2020 11:06

Run black on files.

56f0ad0

Move test of function I'd renamed.

bb8166f

Fix style issues.

428dd9f

Docstring rules are way too aggressive. Also, E203 should be ignored. It's wrong.

Update conversion.py

bc7698a

smoia mentioned this pull request Oct 19, 2020

Going black (i.e. adopting an _opinionated_ python code formatter) #324

Closed

tsalo changed the title ~~[ENH] Add multi-run workflow synchronized with BIDS dataset~~ Add multi-run workflow synchronized with BIDS dataset Oct 26, 2020

smoia mentioned this pull request Dec 2, 2020

Q: is there a phys2bids overview slide to (re)use? #361

Closed

Merge branch 'master' into enh/bids-multi-run

2c7438f

tsalo mentioned this pull request Jan 2, 2021

Add multi-run workflow synchronized with BIDS dataset #374

Closed

18 tasks

tsalo closed this Jan 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-run workflow synchronized with BIDS dataset #219

Add multi-run workflow synchronized with BIDS dataset #219

tsalo commented May 8, 2020 •

edited

Loading

pvelasco commented May 11, 2020

tsalo commented May 11, 2020

tsalo commented May 12, 2020

pvelasco May 18, 2020

tsalo May 18, 2020

RayStick commented Jul 20, 2020 •

edited

Loading

tsalo commented Jul 20, 2020

RayStick commented Jul 21, 2020

tsalo commented Jul 21, 2020

RayStick commented Jul 22, 2020 •

edited

Loading

RayStick commented Oct 8, 2020

tsalo commented Oct 8, 2020

smoia commented Oct 8, 2020

tsalo commented Oct 8, 2020

RayStick commented Oct 9, 2020

smoia commented Oct 10, 2020

codecov bot commented Oct 10, 2020 •

edited

Loading

Add multi-run workflow synchronized with BIDS dataset #219

Add multi-run workflow synchronized with BIDS dataset #219

Conversation

tsalo commented May 8, 2020 • edited Loading

Current Development/Testing Strategy

To Do

Proposed Changes

pvelasco commented May 11, 2020

tsalo commented May 11, 2020

tsalo commented May 12, 2020

pvelasco May 18, 2020

Choose a reason for hiding this comment

tsalo May 18, 2020

Choose a reason for hiding this comment

RayStick commented Jul 20, 2020 • edited Loading

tsalo commented Jul 20, 2020

RayStick commented Jul 21, 2020

tsalo commented Jul 21, 2020

RayStick commented Jul 22, 2020 • edited Loading

RayStick commented Oct 8, 2020

tsalo commented Oct 8, 2020

smoia commented Oct 8, 2020

tsalo commented Oct 8, 2020

RayStick commented Oct 9, 2020

smoia commented Oct 10, 2020

codecov bot commented Oct 10, 2020 • edited Loading

Codecov Report

tsalo commented May 8, 2020 •

edited

Loading

RayStick commented Jul 20, 2020 •

edited

Loading

RayStick commented Jul 22, 2020 •

edited

Loading

codecov bot commented Oct 10, 2020 •

edited

Loading