Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Clarify sample column values to be 0-based and possibly non-integer #1441

Closed

Conversation

effigies
Copy link
Collaborator

Based on discussion in bids-standard/bids-examples#356, it seems that defining the sample column of events.tsv as containing integer values may have been mistaken. This column was introduced in 1.2.0, when EEG and iEEG were added to the specification, and there does not appear to be consensus among contributors to those BEPs that this was intended to be an integer, with @arnodelorme stating:

Sample latencies can sometimes be fractional because they come from a different machine that has more resolution than the EEG sampling frequency. This is important for reaction time, for example. Even with EEG sampled at 250 Hz, you need to be able to determine reaction time with 1ms (1000 Hz equivalent sampling rate) precision.

If sample is not purely an index, then it is a value where the time unit is not seconds. Therefore it should be possible to say "0 samples from the start of the run", resolving issue #499 in favor of 0-based "indexing".

This PR is necessary in order to correctly validate events.tsv files that contain non-integer sample indices.

@effigies effigies added iEEG EEG Electroencephalography opinions wanted Please read and offer your opinion on this matter labels Mar 15, 2023
@codecov
Copy link

codecov bot commented Mar 15, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (8f21b45) 88.01% compared to head (cda30ec) 88.01%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1441   +/-   ##
=======================================
  Coverage   88.01%   88.01%           
=======================================
  Files          14       14           
  Lines        1268     1268           
=======================================
  Hits         1116     1116           
  Misses        152      152           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@arnodelorme
Copy link

arnodelorme commented Mar 16, 2023

We need fractional samples for EEG, MEG, and iEEG - and I think this update goes against this. We need fractional samples because events are often acquired with different hardware, which has a resolution higher than the EEG. For example, a typical EEG file can be sampled at 500 Hz, and events can be at 1000 Hz or more (this is useful because, in psychophysics, you need millisecond precision on events).

Most BIDS export tools will allow placing events from different software (presentation, Eprime) into the EEG event file.

@effigies
Copy link
Collaborator Author

Could you recommend alternative wording? I was aiming to allow fractional samples, but if it can be read as restricting them, we have a problem.

@sappelhoff
Copy link
Member

sappelhoff commented Mar 16, 2023

This is what I remember from the BEP development process:

  • Much of (almost all) ephys recording involves sending "event markers" (or "triggers") from the experiment presentation hardware (or some other hardware) to the recording hardware (typically an amplifier) that digitizes the neural signal (we have samples then)
  • The incoming event markers are then digitized together with the neural signal
  • As such, each event marker is associated with a specific, discrete sample -- it cannot lie in between samples

The sample metadata for events.tsv files was introduced together with the value metadata to cover exactly this situation --> these two metadata entries were supposed to duplicate (and make accessible) information that is present in most (almost all) ephys data formats: Event markers associated with discrete samples in the data files (=indices into the data).

If we make sample a zero-based and fractional metadata entry, I don't really see the difference to what we already cover with onset.

@sappelhoff sappelhoff added MEG Magnetoencephalography NIRS labels Mar 16, 2023
@sappelhoff sappelhoff added this to the 1.9.0 milestone Mar 16, 2023
@robertoostenveld
Copy link
Collaborator

The sample column also came up in the INCF/BEP032 meeting on Wednesday regarding animal electrophysiology data. Note that there we identified that "sample" appears to be used both for "data sample" (common terminology in EEG/iEEG/MEG but also in LFP) and for "biological sample" as used in BIDS for microscopy. But in this issue it is about "data samples".

Wednesday during the online meeting I was not able to find "sample" defined in the BIDS spec, but I did point to some examples that have it in their events.tsv. Here is one, here another.

I have used it myself in my own datasets, but was under the impression that sample in events.tsv was an additional column not specified by BIDS. Just like countdown_offset and value, it would be appropriate to document them in a corresponding events.json.

Given that I thought sample was my own free choice, I was a bit confused when I found this in the yaml file which I think is the reason for sample appearing in the glossary. I think this is now also the reason for this issue being discussed, i.e. to formally define sample. But is that needed? We can also leave it to the user (and hope that the events.json is specific about it being 0- or 1-indexed). It is redundant with the onset column anyway, right?

@VisLab
Copy link
Member

VisLab commented Mar 17, 2023 via email

@robertoostenveld
Copy link
Collaborator

Ah, thanks @sappelhoff. I now see that it is mentioned in the current spec regarding the task events.

In general I personally prefer the BIDS specification to be as small as possible, but as large as needed. I don't see the need for this being defined in the spec, as it can also be specified in the events.json. Hence I would not object if sample were removed rather than being listed as OPTIONAL. As there is no tight definition right now anyway, it would not hurt to make it completely the user's responsibility. The option of "Additional Columns" remains.

@sappelhoff
Copy link
Member

sappelhoff commented Mar 20, 2023

If we were to remove (effectively "demote") sample as a BIDS-defined column in events.tsv, that would mean that all BIDS datasets that have sample in events.tsv but no sample in an associated events.json will trigger the warning that an arbitrary column sample is not described in the JSON file.

Not sure whether that would classify as a backward incompatible change, as some dataset curators may have relied on using the definition of sample when preparing their datasets.

I am in favor of making the definition of sample more clear/strict for it to serve as a way to duplicate event marker information from ephys raw files (together with the accompanying value column, see my comment: #1441 (comment)). I am pretty sure that this was the original intent.

The only "wonky" thing about my proposal is that we have to define post-hoc whether sample is supposed to be zero or one indexed. Here I would vote for one-indexed, as we are not dealing with a "time-like" variable (like onset) but simply an index.


In other news: What happens if a dataset curator uses a pre-defined column (like sample) in their dataset, but supply a definition for sample in their JSON file that completely diverges from the BIDS definition? I don't think we have a section for this case in the specification, but it feels like we should. And I believe this should be NOT RECOMMENDED, however if it's being done, we could state that the "user override" is be authoritative.

Let's discuss this latter thing here:

@sappelhoff
Copy link
Member

After talking with @effigies yesterday, I am now also in favor of demoting sample and value from OPTIONAL columns to "arbitrary" columns. That is, to remove those two definitions from the BIDS specification and let each dataset-curator define these columns for their use case in the JSON files.

For these reasons:

  1. this will "only" raise a warning for old datasets that relied on sample and value definitions being in the BIDS spec
  2. neither sample nor value were particularly well defined to begin with, and value furthermore has an unpleasantly general
    name for the very specific use-case it was "defined" for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EEG Electroencephalography iEEG MEG Magnetoencephalography NIRS opinions wanted Please read and offer your opinion on this matter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants