Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINT] make raw_to_bids more modular #98

Closed
jasmainak opened this issue Sep 22, 2018 · 18 comments · Fixed by #106
Closed

[MAINT] make raw_to_bids more modular #98

jasmainak opened this issue Sep 22, 2018 · 18 comments · Fixed by #106
Labels

Comments

@jasmainak
Copy link
Member

As I complained in many other pull requests about raw_to_bids growing number of arguments, I am raising an issue about it. I think we should break down raw_to_bids into these functions (subject to discussion):

raw = mne_bids.io.read_raw(raw_fname, ...)   # with different arguments for reading filenames
raw_bids_fname = make_bids_filename(subject, session, task, acquisition)
write_raw_bids(raw, raw_bids_fname, events)

What do you think @teonbrooks @sappelhoff @monkeyman192 ?

@monkeyman192
Copy link
Contributor

will this even work?
There are quite a few different file names that need to be produced in quite a variety of ways, I feel like passing a single file name/path could make the code in the write_raw_bids quite messy.
One solution could be to do something like what I did a little while ago (but scrapped because it wasn't really a good solution) which was to make a class which could generate all the file paths are required.
If we did this again (eg. on that second line), then we could still have clean code and reduce the number of arguments, but it would mean we need to create a class to handle the file path generation.

@jasmainak
Copy link
Member Author

will this even work?
There are quite a few different file names that need to be produced in quite a variety of ways, I feel like passing a single file name/path could make the code in the write_raw_bids quite messy.

The way I think of this is that you could consider having a small private function that extracts session, task, acquisition etc. given the filename. The first line of write_raw_bids calls this function and the rest works as usual. Maybe I'm oversimplifying but that's the general idea I'm hinting towards.

One solution could be to do something like what I did a little while ago (but scrapped because it wasn't really a good solution) which was to make a class

you're going to have a tough time trying to convince @teonbrooks for a class :)

@jasmainak
Copy link
Member Author

jasmainak commented Sep 23, 2018

Also, with the current API, we are mixing up two different actions/things: one is a general raw reader and the other is converting + saving raw.info into BIDS.

@monkeyman192
Copy link
Contributor

The way I think of this is that you could consider having a small private function that extracts session, task, acquisition etc. given the filename

so you essentially construct a file name, then would have a function that would pretty much just pull it apart again to construct the actually required file paths. You might as well just pass the parameters grouped together in a dictionary or something and save yourself the effort of having to extract the info.

you're going to have a tough time trying to convince @teonbrooks for a class :)

I wasn't exactly proposing it again, just saying that I thought it may be needed if we wanted that kind of functionality.

@jasmainak
Copy link
Member Author

You might as well just pass the parameters grouped together in a dictionary or something and save yourself the effort of having to extract the info.

Yes, but I see this option to be more user friendly. I'm trying to get this closer to the idea of a write/save function where you only need to provide just the filename. It's something people are used to ...

@sappelhoff
Copy link
Member

I like @monkeyman192 's earlier idea of keeping raw_to_bids as it is and using modality specific dictionaries as input parameters to raw_to_bids.

Example:
eeg_reference and eeg_ground parameters get summarized by eeg_inputs, which is a dict: {'EEGReference': 'Cz', 'EEGGround': 'Fz'}

Like that we can maintain what's already working and address the issue of a growing number of parameters.

@jasmainak
Copy link
Member Author

jasmainak commented Sep 23, 2018

How would you document this? Can you show me an example docstring?

@sappelhoff
Copy link
Member

def raw_to_bids(subject_id, task, raw_file, output_path, session_id=None,
                run=None, kind='meg', events_data=None, event_id=None,
                meg_inputs=None, eeg_inputs=None, ieeg_inputs=None,
                config=None, overwrite=True, verbose=True):
    """Walk over a folder of files and create BIDS compatible folder.

    Parameters
    ----------
    subject_id : str
        The subject name in BIDS compatible format ('01', '02', etc.)
    task : str
        Name of the task the data is based on.
    raw_file : str | instance of mne.Raw
        The raw data. If a string, it is assumed to be the path to the raw data
        file. Otherwise it must be an instance of mne.Raw
    output_path : str
        The path of the BIDS compatible folder
    session_id : str | None
        The session name in BIDS compatible format.
    run : int | None
        The run number for this dataset.
    kind : str, one of ('meg', 'eeg', 'ieeg')
        The kind of data being converted. Defaults to "meg".
    events_data : str | array | None
        The events file. If a string, a path to the events file. If an array,
        the MNE events array (shape n_events, 3). If None, events will be
        inferred from the stim channel using `mne.find_events`.
    event_id : dict | None
        The event id dict used to create a 'trial_type' column in events.tsv
    meg_inputs : dict | None
        MEG-specific inputs necessary for BIDS formatting, required if kind=='meg'.
        Contains the following keys: hpi, electrode, hsp.
        hpi : None | str | list of str
            Marker points representing the location of the marker coils with
            respect to the MEG Sensors, or path to a marker file.
            If list, all of the markers will be averaged together.
        electrode : None | str
            Digitizer points representing the location of the fiducials and the
            marker coils with respect to the digitized head shape, or path to a
            file containing these points.
        hsp : None | str | array, shape = (n_points, 3)
            Digitizer head shape points, or path to head shape file. If more than
            10`000 points are in the head shape, they are automatically decimated.
    eeg_inputs : dict | None
        EEG-specific inputs necessary for BIDS formatting, required if kind=='meg'.
        Contains the following keys: eeg_ref, eeg_ground
        eeg_ref : str
            Description of the type of reference used and (when applicable) of
            location of the reference electrode. Defaults to None.
        eeg_gnd : str
            Description  of the location of the ground electrode. Defaults to None.
    ieeg_inputs : dict | None
        IEEG-specific inputs necessary for BIDS formatting, required if kind=='meg'.
        Contains the following keys: ieeg_ref, ieeg_ground

....


    config : str | None
        A path to the configuration file to use if the data is from a BTi
        system.
    overwrite : bool
        If the file already exists, whether to overwrite it.
    verbose : bool
        If verbose is True, this will print a snippet of the sidecar files. If
        False, no content will be printed.

In this simple example we already replaced 5 parameters with three ... and these three new parameters have a lot of capacity for more potential inputs

@jasmainak
Copy link
Member Author

umm but now you can't have default parameters anymore + your docstring is nested + your code is still as complicated as before ...

@jasmainak
Copy link
Member Author

To illustrate the issue of default parameters, if I do:

meg_inputs = dict(hpi=hpi, hsp=hsp)
raw_to_bids(..., meg_inputs, ...)

this won't work with existing code

@sappelhoff
Copy link
Member

okay fair enough, I kind of agree and I can't even find proper guidance on how to elegantly format such a docstring. It's a sign that this kind of API is not used broadly (and thus possibly a bad idea)

@monkeyman192
Copy link
Contributor

I feel like some of these problems wouldn't exist if the Raw object actually contained the file paths to the objects that were used to construct it (which is a point I have made before).
We have 3 extra arguments for MEG data for example just because we need to specify hpi, hsp and electrode so that this information can be converted to BIDS format correctly. If these paths were stored in the raw object then we wouldn't have a problem.
This would however require the function to operate solely on raw objects (or I guess for .fif/.ds (and probably other) files you could pass just the file, but then it becomes exception based), and would require reworking of stuff in MNE, but It would alleviate some of the problems if the raw data actually contained all the information about what was used to construct it.

Also, if you really wanted to allow for backward compatibility whilst still allowing a new dictionary like meg_inputs you could add a (very dirty) **kwargs to the args to catch anything that isn't otherwise specified.

But I would also raise the point of, well how many people do we think currently use the code, and would it not be better to try and sort out having a more robust way for the function to have arguments now before we release it and it grows more? If we just leave it and then further arguments are needed, if more people are using it it becomes even more annoying to change down the line.

@jasmainak
Copy link
Member Author

We can't have kwargs. It's evil :) Matplotlib uses it, and it's very hard to find documentation. Matplotlib still struggles with documentation to this day ...

@jasmainak
Copy link
Member Author

This is a blocker for our release. I'll try to flesh out a pull request sometime this week.

@teonbrooks
Copy link
Member

ok @jasmainak, i will then become Mainak for the review ;) KISS YAGNI!

@jasmainak
Copy link
Member Author

yes I need that :) I'm getting old reviewing code and replying to emails ...

@teonbrooks
Copy link
Member

teonbrooks commented Oct 3, 2018

@jasmainak still waiting for you to close #73 😂

@jasmainak
Copy link
Member Author

alright I started it, hope to finish it later today :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants