Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic recognition of channel contents #204

Open
smoia opened this issue Apr 5, 2020 · 11 comments
Open

Automatic recognition of channel contents #204

smoia opened this issue Apr 5, 2020 · 11 comments
Assignees
Labels
BrainHack This issue is suggested for BrainHack participants! Enhancement New feature or request

Comments

@smoia
Copy link
Member

smoia commented Apr 5, 2020

Detailed Description

The content of a BIDS physiological file contains is described in its companion .json file - where an important entry contains the column header.
At the moment, phys2bids fills that entry either using the name of the channels from the original file or using the list that the user provides with the option -chnames.
Look at the channels from our tutorial file,
tutorial_file
It's quite easy for a human (that has a little acquaintance with physiological recordings) to make an educated guess on what is the content of each channel. We could write a function that does that "educated guess" and suggests a column header for each channel to the user.

Context / Motivation

BIDS suggests specific column headers for the files, and it would be great if we could help the standardisation of this report by adding such function.

Possible Implementation

The first thing that comes up in my mind would be a sort of pattern recognition function/comparison with a database of physiological recordings.
Triggers are normally spiky (or blocky), pulses have quite a recognisable pattern, respiration another one. Chest expansion-based respiration recordings are normally smoother than others, while O2 and CO2 could have tidal shape responses and they are normally the inverse of each other.

@RayStick @BrightMG @CesarCaballeroGaudes and @rmarkello might have even better ideas!

@smoia smoia added Enhancement New feature or request BrainHack This issue is suggested for BrainHack participants! labels Apr 5, 2020
@smoia smoia added this to the The BrainWeb milestone Apr 5, 2020
@eurunuela
Copy link
Collaborator

I think this should be pretty easy to do with a supervised machine learning algorithm (a SVM maybe?). I don't think much training data would be necessary to make it work.

I do have a concern though. How do you differentiate between CO2 and O2? Is this differentiation critical or is it okay if the algorithm incorrectly names these two?

@vinferrer
Copy link
Collaborator

That's one issue. The other is where do we find a big dataset to train that SVM?

@eurunuela
Copy link
Collaborator

We could try with the data we have.

@vinferrer
Copy link
Collaborator

I don't think we have enough samples, but it could be a starting point

@smoia
Copy link
Member Author

smoia commented Apr 6, 2020

Training an SVM could be an option. We have a lot of data in house that could serve as training dataset for some types of physiological data - but we'll have to wait to make it public first, probably.

At the beginning, we could make a simple suggestion dividing triggers from pulses from general respiration based channels. In fact, the latest BIDS stable release suggests only three column headers. However, it is very important that there is no mistake in classification (CO2 and O2 are not equivalent, so we shouldn't treat them as such!), especially if we keep expanding the physiopy suite (#186), and especially since half of the contributors of phys2bids work quite a lot with CO2 recordings!

Why don't we wait to see if during the BrainWeb there is someone more experienced in machine learning or pattern recognition that could help us with their expertise?

@eurunuela
Copy link
Collaborator

That's what I was thinking, to use the in house data to train the SVM.

Regarding the CO2 and O2, that's why I was pointing out that we should find a way of correctly differentiating them. To me the signals look pretty much complimentary, meaning that an SVM algorithm may fail to correctly assign headers for this data.

Checking with people at the BrainWeb hackathon sounds good to me 😉

@RayStick
Copy link
Member

RayStick commented Apr 6, 2020

A few quick thoughts:

  1. I have longer recordings (compared to the ones on OSF already) that could be provided at a later date, to train the SVM, if needed.

  2. Would this training only be implemented if there is not channel name information in the header files? Most of the software people use to record these physiological data does allow you to name channels, so in this case the training would not be needed? It would be good to have this option for the cases where there is no header info/channel names, of course.

  3. In principle, I think some pattern recognition approach would work - as Stefano explains, the different signals have noticeably different properties. As for the CO2 and O2 - yes, they are an inverse of one another (in terms of shape) however there can sometimes be a very slight recording offset due to how the gas analyzers work. Also, even though their pattern is very similar their units will not be (whether they be measured in voltage, percent or mmHg) so if that could be taken into account, alongside pattern recognition, that could be a way of distinguishing them. For example, min(CO2 channel) is always going to be smaller than min(O2 channel).

@smoia
Copy link
Member Author

smoia commented Apr 6, 2020

I think that the training (if any) will take place offline - the projection on new data could take place on request. It's true that most software lets you name the channels, but sometimes such channels are set for the software in a multi-user lab and they don't have the right name.

@drombas
Copy link
Contributor

drombas commented Feb 9, 2021

Hi, hoping this issue is still of interest!

I have done a quick frequency analysis. What I did to each signal:

  1. Subtract the mean (remove DC component)
  2. Compute the Fourier transform
  3. Take the square of the module (get power)
  4. Divide it by its sum (to get power density function)
  5. Calculate the frequency for the 95 power percentile (plotted in red)

Some ideas about the results:

  • Signals have different spectrum power distributions that could be used for classification.
  • CO2/O2
    • All its power is below 0.5 Hz (a much lower frequency band compared to ECG or trigger). It seems reasonable to assume that this range will vary little across subjects as it is related to the human breathing frequency.
    • Classifying then between CO2 and O2 would require another step (comparing their amplitudes for example as @RayStick said).
  • ECG: the power is concentrated within the range [1 - 6] Hz. Again, this band is determined by the human cardiac pulse (with known limits) so we could classify as ECG signals with most of the power in that band.
  • Trigger:
    • It has the characteristic spectrum of a train of pulses. It seems feasible to come up with a metric that identifies the envelope shape.
    • If I remember well, the first peak of the spectrum (plotted in green) encodes the pulse period (TR). This could be useful since I think that currently this parameter is manually entered by the user.

A few questions that come to my mind:

  • Are the signals in the tutorial the kind of input we always expect? I mean, are there any cases where the trigger or other signal might not be acquired, or other kind of physiological signals could be added? It is probably easier to assign four signals to four different categories than to assign a single signal to one out of N categories.

frequencyPlot

@eurunuela
Copy link
Collaborator

Thank you @drombas ! Those are great ideas we can build up on.

We already know where the power of the cardiac and respiratory spectra should fall. Those would be the easiest to find by just looking at the spectra. Also, between O2 and CO2, the former has a higher amplitude.

So, I would calculate the PSD of all the channels, then find the one with cardiac frequency (there goes one channel) and another channel (or two) with the respiratory frequency. Between the respiratory ones, we set the one with the highest amplitude to O2, and the other one to CO2.

Finally, If the trigger only takes 2 different values; i.e. it's binary, the mean should be much closer to the baseline than to the maximum value. Also, if it's binary and we separate the baseline from the maximum values, we could check that the sum of all the values in the baseline is actually the minimum value times the number of points below the average (and the same with the maximum).

These may not be as fancy as doing an SVM but they could work and should be fairly simple to implement.

@drombas
Copy link
Contributor

drombas commented Feb 19, 2021

I can't find the option to self-assign this issue but just to let you know I'm actively working on it.

Thanks @eurunuela for the suggestions! As you said I will probably go for a time-domain detection of the trigger based on its binary nature and a spectral-domain classification between cardiac and respiratory signals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BrainHack This issue is suggested for BrainHack participants! Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants