Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Allow marking channels as bad in existing datasets #491

Merged
merged 43 commits into from
Sep 1, 2020

Conversation

hoechenberger
Copy link
Member

@hoechenberger hoechenberger commented Aug 6, 2020

PR Description

I've started working on something we've needed for a long time now: marking channels of an existing BIDS dataset as "bad". This is needed because we'd like to encourage colleagues to convert to BIDS ASAP after data acquisition. Now, during data inspection / processing, they might discover problematic channels, and currently the only way to mark them as bad in the BIDS data is by editing the channels.tsv file(s) – not good! Even worse, some may opt to read the BIDS data, and mark channels as bad only in the derivative data they produce. Therefore we've figured we'd need a way to make it convenient to alter / amend existing metadata.

This implementation is just a first draft, adding a write.mark_bad_channels() function. It will read the relevant channels.tsv file, mark the requested channels as bad, and write the altered metadata back to disk, replacing the existing file. I do realize that what I'm doing here can be further abstracted, e.g. to also allow users to mark bad channels as good; or to update other bits of the metadata as well. However I just wanted to move ahead for now with a concrete implementation that has actual real-world relevance for us and our colleagues. Totally open to rethink and refactor this later, and looking forward to your thoughts and suggestions!

WIP because I also want to add a command line interface, and I haven't run extensive tests so far.

cc @agramfort

Merge checklist

Maintainer, please confirm the following before merging:

  • All comments resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • PR description includes phrase "closes <#issue-number>"
  • Commit history does not contain any merge commits

@codecov-commenter
Copy link

codecov-commenter commented Aug 6, 2020

Codecov Report

Merging #491 into master will decrease coverage by 0.30%.
The diff coverage is 86.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #491      +/-   ##
==========================================
- Coverage   93.45%   93.14%   -0.31%     
==========================================
  Files          14       15       +1     
  Lines        2079     2175      +96     
==========================================
+ Hits         1943     2026      +83     
- Misses        136      149      +13     
Impacted Files Coverage Δ
mne_bids/commands/mne_bids_mark_bad_channels.py 86.36% <86.36%> (ø)
mne_bids/write.py 95.71% <87.03%> (-1.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 03da314...159bce8. Read the comment docs.

@sappelhoff
Copy link
Member

good idea, some initial remarks:

  • if in channels.tsv no status column is found, insert it
  • allow for providing a reason for why channel is bad? --> write status_description column

not sure about the API for my second point

@hoechenberger
Copy link
Member Author

  • if in channels.tsv no status column is found, insert it

Right. I assumed this column was REQUIRED but just checked the standard again and it is not. So I will insert it if not present and set all channels to "good", except for the ones to be marked as bad.

  • allow for providing a reason for why channel is bad? --> write status_description column

Good idea, will add.

@sappelhoff
Copy link
Member

sappelhoff commented Aug 6, 2020

So I will insert it if not present and set all channels to "good", except for the ones to be marked as bad.

I would set as n/a to be safe. n/a is always allowed and seems more appropriate in this case ... something like "we haven't looked"

edit: actually no. That might not be smart. forget what I wrote :-D

@hoechenberger
Copy link
Member Author

Forget what exactly? :)

@hoechenberger
Copy link
Member Author

If I read the specs correctly, only "good" and "bad" are allowed. I guess this kind of makes sense... anything not bad should be good :)

@sappelhoff
Copy link
Member

n/a is always allowed, but I figured that when people are explicitly marking some channels as BAD, then it's fair to assume they have checked the rest and evaluated them as GOOD.

I would put n/a when we are unsure whether or not a channel was screened.

@hoechenberger
Copy link
Member Author

I need advice on the API.

  • Because I wanted to make the description parameter optional,
  • and because I wanted to have it before the bids_basename and bids_root in the function signature,
  • and since we cannot do anything without bids_basename and bids_root,

I'm now setting bids_basename and bids_root to None by default and raise if they are None when the function is being invoked. This is kind of ugly, but I'm not sure how to do it better – it's a limitation in Python. Or am I missing something obvious here?

@hoechenberger
Copy link
Member Author

I'm now setting bids_basename and bids_root to None by default and raise if they are None when the function is being invoked. This is kind of ugly, but I'm not sure how to do it better – it's a limitation in Python. Or am I missing something obvious here?

We could use keyword-only arguments:

def mark_bad_channels(channels, *, descriptions=None, bids_basename,
                      bids_root, kind=None, verbose=True):

This would allow users to pass channels positionally or as a keyword, whichever they prefer; descriptions can be omitted or passed as a keyword; bids_basename and bids_root would be required, but would have to be specified as keywords, which is probably good practice here anyway.

Any objections?

@hoechenberger
Copy link
Member Author

Did it such that both channels and descriptions may be passed by position.

@hoechenberger hoechenberger changed the title WIP: Allow marking channels as bad in existing datasets Allow marking channels as bad in existing datasets Aug 6, 2020
@hoechenberger
Copy link
Member Author

Removed the WIP flag bc I've decided to add command-line support in a followup PR.

@hoechenberger
Copy link
Member Author

hoechenberger commented Aug 6, 2020

I started to wonder if maybe instead of having two arguments channels and descriptions, which are both iterables, we should only have one argument that is a dict in the form {ch_name: description}. This would put a stronger emphasis on the fact that users usually should add a description. Also it would make it easier to visually keep track of which description belongs to which channel, should users need to mark a larger number (>4 or so) as bad.

WDYT?

@hoechenberger
Copy link
Member Author

hoechenberger commented Aug 7, 2020

Following a suggestion by @agramfort, I have added an overwrite kwarg which allows users to specify whether they wish to only change the specified channels (overwrite=False, default), or whether to reset all channels, leaving only those passed to the function marked as bad, and marking all others as good. Even more, this now allows users to remove the "bad" status of all channels by passing channels=[], overwrite=True.

@hoechenberger
Copy link
Member Author

If CIs pass, this is good to go from my end. I know it's quite a bit of code, but maybe you can start your review by looking at the Examples section I've included in the docstring of the new function.

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can you add command line?

see mne_mark_bad_channels from mne-C

mne_bids/tests/test_write.py Outdated Show resolved Hide resolved
mne_bids/write.py Outdated Show resolved Hide resolved
mne_bids/write.py Outdated Show resolved Hide resolved
mne_bids/write.py Outdated Show resolved Hide resolved
mne_bids/tests/test_write.py Outdated Show resolved Hide resolved
mne_bids/write.py Outdated Show resolved Hide resolved
mne_bids/write.py Show resolved Hide resolved
mne_bids/write.py Outdated Show resolved Hide resolved
@agramfort
Copy link
Member

agramfort commented Aug 7, 2020 via email

@adam2392
Copy link
Member

adam2392 commented Aug 7, 2020

^ what do we think about either making all the entity Params explicit (e.g. subject instead of sub)

Or making it not short hand in the make_bids_basename and Bidspath?

I’m in favor of #1.

@agramfort
Copy link
Member

agramfort commented Aug 7, 2020 via email

@hoechenberger
Copy link
Member Author

^ what do we think about either making all the entity Params explicit (e.g. subject instead of sub)

+1

@hoechenberger
Copy link
Member Author

also can you add command line?

Done in 285d0a0

@hoechenberger
Copy link
Member Author

Missing functionality in the CLI: passing no bad channels and overwrite=True to reset all channels to "good". Will look into this tomorrow.

@hoechenberger hoechenberger changed the title Allow marking channels as bad in existing datasets [MRG] Allow marking channels as bad in existing datasets Aug 7, 2020
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the API and examples in the docstr, and that looked very good to me. I think this will be very useful.

mne_bids/write.py Outdated Show resolved Hide resolved
mne_bids/write.py Outdated Show resolved Hide resolved
@hoechenberger
Copy link
Member Author

I think I have addressed all review comments and this should be good to merge.

@jasmainak
Copy link
Member

Pybids has it: https://github.com/bids-standard/pybids/blob/master/examples/pybids_tutorial.ipynb

the closest what we have in mne-bids is get_entity_vals. Not sure to what extent you want to duplicate pybids functionality. Their design is a bit too fancy for my taste ... there is (or used to be) a dependency grabbids which populated methods on demand based on which entities existed in the dataset. For example, if there is an entity called subject you will get a method called get_subjects. The details are a bit hazy in my mind but you have the keywords :)

@jasmainak
Copy link
Member

Now the question is -- 1) to what extent do you want to duplicate that functionality (cf also reviewer comments in mne-bids paper) rather than work with pybids folks, and 2) can you get something that does the job using regular expressions without employing something as fancy as pybids? Maybe it works only for ephys but that's fine for us

@agramfort
Copy link
Member

agramfort commented Aug 13, 2020 via email

@hoechenberger
Copy link
Member Author

Yes definitely let's look into implementing this ourselves or just borrowing the relevant bits from pybids, but let's try not to introduce this dependency

@jasmainak
Copy link
Member

yes but definitely look into what pybids does. It will inform you what users want.

@hoechenberger
Copy link
Member Author

yes but definitely look into what pybids does. It will inform you what users want.

Ok!

examples/mark_bad_channels.py Outdated Show resolved Hide resolved
mne_bids/commands/mne_bids_mark_bad_channels.py Outdated Show resolved Hide resolved
mne_bids/commands/mne_bids_mark_bad_channels.py Outdated Show resolved Hide resolved
mne_bids/commands/mne_bids_mark_bad_channels.py Outdated Show resolved Hide resolved
mne_bids/commands/mne_bids_mark_bad_channels.py Outdated Show resolved Hide resolved
mne_bids/tests/test_write.py Outdated Show resolved Hide resolved
mne_bids/write.py Show resolved Hide resolved
mne_bids/write.py Outdated Show resolved Hide resolved
# Update info['bads']
bads = _get_bads_from_tsv_data(tsv_data)
raw.info['bads'] = bads
# XXX (How) will this handle split files?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is your concern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's very easy, I've never worked with split files before. So here (actually in the next line, not shown here by github) we're only updating the very first file -- raw.filenames[0]. In my current tests, there is always exactly ONE file in raw.filenames. If we have split files, it could be multiple. I'm not sure if we would have to iterate over all of them, or if only the first one carries a (meaningful) info dict, and updating that one is sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a split file then sidecar files should be identical. It's basically one file split over different files. metadata should be therefore exactly the same.

mne_bids/write.py Outdated Show resolved Hide resolved
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
@hoechenberger hoechenberger changed the title Allow marking channels as bad in existing datasets [MRG] Allow marking channels as bad in existing datasets Sep 1, 2020
@hoechenberger
Copy link
Member Author

@agramfort Unless #491 (comment) needs more work, this is good to be merged once CI goes green.

@agramfort
Copy link
Member

agramfort commented Sep 1, 2020 via email

@agramfort agramfort merged commit b25ba5e into mne-tools:master Sep 1, 2020
@agramfort
Copy link
Member

Thx @hoechenberger

@hoechenberger hoechenberger deleted the mark-bads branch September 1, 2020 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants