Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF] Modularize io #692

Merged
merged 42 commits into from
Apr 1, 2021
Merged

[REF] Modularize io #692

merged 42 commits into from
Apr 1, 2021

Conversation

jbteves
Copy link
Collaborator

@jbteves jbteves commented Feb 24, 2021

Closes None .

Changes proposed in this pull request:

  • Switches outdir from a function argument to an io global
  • Adds --prefix option to prepend a name to the output, also an io global
  • Adds --convention option to specify bids or kundu output, with bids the default
  • Modifies smoke test for test_io to match these changes

Note: the current solution I have is a little wonky for naming the output files, but I think it should be okay. split at the end denotes something that has formatting according to echo-number, which is something that split_ts requires in order to function properly in the way I wrote this. This is a draft PR so I'm happy to make loads of modifications, but I think the spirit of this implementation should work to solve the problem of varying naming conventions over time while retainining backwards compatibility.

This current draft does not allow backwards compatibility to keep .json files instead of .tsv files.

Joshua Teves and others added 8 commits February 23, 2021 16:18
… modularize-io

Experienced numerous conflicts; this is probably an incomplete patch.

Conflicts:
      tedana/decomposition/pca.py
      tedana/io.py
      tedana/workflows/tedana.py

Co-authored-by: Taylor Salo <tsalo006@fiu.edu>
@jbteves
Copy link
Collaborator Author

jbteves commented Feb 24, 2021

Hm, an interesting problem I'm running into is that when I run
pytest test_t2smap.py it actually passes, but if I run pytest . in the test directory, that same file fails. I have no idea why that's happening, but it seems to be related to directory creation of some kind.

@jbteves jbteves changed the title Modularize io [REF] Modularize io Feb 24, 2021
@jbteves
Copy link
Collaborator Author

jbteves commented Feb 24, 2021

I'm still going to take a look at what's causing this strange behavior for the tests, but @tsalo if you take a look at the table and the function that uses it to generate filenames, I think you'll get the gist of what this PR does and you can decide if you like the implementation or not. Maybe @dowdlelt could also take a quick look?

tedana/io.py Outdated Show resolved Hide resolved
@dowdlelt
Copy link
Collaborator

In general I think this looks great, and it is what I was imagining - a system ask minimal changes from the user (specific convention, if you want), but is flexible enough that code remains legible without a litany of if statements. solid 👍 from me.

@handwerkerd handwerkerd self-requested a review March 9, 2021 18:21
@handwerkerd
Copy link
Member

As @dowdlelt notes, the current output names are "kundu" and "bids" I think this is doubly problematic because "kundu" isn't particularly informative and "bids" assumes the bids derivatives outputs won't ever change (i.e. we won't end up with bids_v1, bids_v2, etc).

My proposal is to have 3ish options: "orig"(I'm ok with kundu), "bids" and "version X". "bids" will always output the most recent version of the bids derivatives output. "version X" will output whatever was the current output for the given version number. That is any version number less than or equal to 0.0.9 would be identical to "orig", but if bids changes in the future, using the same version number as was used for an older execution will produce matching outputted file names.

I suggest using only released versions. That is, if v0.0.9+44 is different from v0.0.9+45, that difference will be lost with time. We'll only commit to saving file name changes that made it into a unique release.

To do this, the if clause at https://github.com/jbteves/tedana/blob/a5490a053d4f0cd5d7031e8a9d95f4488d53c931/tedana/io.py#L124-L128 would need to become slightly more complex.
In io.py, img_table would need to have a new element for each table index, which might get messy, if there are too many version changes. I think we can punt on that for now. If there are >3 file name versions being stored, we can change img_table to a structure where elements could be accessed by descriptive key and column label rather than a descriptive key and an index number.

Semi-related, particularly if a non-default --convention is being used, that information should be mentioned in the tedana output.

Thoughts?

@handwerkerd
Copy link
Member

The json & tsv files always use bids file naming conventions. I previously suggested that we stick BIDS conventions for file type (i.e. json vs tsv), but we do allow for the file name to chance with the options. Do others have opinions on this?

@jbteves
Copy link
Collaborator Author

jbteves commented Mar 9, 2021

That all makes sense. I would propose the following, then:

  1. Make a clarifying comment that convention will be overridden by the workflow functions.
  2. Make the existing img_table a nested dict, wherein we have the image type as the outer key and the naming convention as an inner key
  3. Make the bids inner key point to the latest bids version, while adding a name indicating the current bids derivatives version (will need @tsalo's input on that).
  4. I realized that I never added table naming; I'll do something similar to the work on the img_table where we generate a table name.

Apparently I forgot to hit "comment" on this when I wrote it earlier this afternoon.

I think we should have a table of table-names so that we can switch conventions for the names of tables as well. It would be weird to only do images.

@codecov
Copy link

codecov bot commented Mar 12, 2021

Codecov Report

Merging #692 (1b61d7f) into bids-derivatives (cc9296b) will increase coverage by 0.02%.
The diff coverage is 96.96%.

Impacted file tree graph

@@                 Coverage Diff                  @@
##           bids-derivatives     #692      +/-   ##
====================================================
+ Coverage             93.58%   93.60%   +0.02%     
====================================================
  Files                    26       27       +1     
  Lines                  2041     2096      +55     
====================================================
+ Hits                   1910     1962      +52     
- Misses                  131      134       +3     
Impacted Files Coverage Δ
tedana/__init__.py 100.00% <ø> (ø)
tedana/metrics/kundu_fit.py 95.85% <ø> (-0.03%) ⬇️
tedana/io.py 95.68% <94.52%> (-1.41%) ⬇️
tedana/constants.py 100.00% <100.00%> (ø)
tedana/decomposition/pca.py 87.09% <100.00%> (-0.14%) ⬇️
tedana/gscontrol.py 100.00% <100.00%> (ø)
tedana/reporting/html_report.py 100.00% <100.00%> (ø)
tedana/workflows/t2smap.py 94.25% <100.00%> (+0.35%) ⬆️
tedana/workflows/tedana.py 89.93% <100.00%> (+0.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc9296b...1b61d7f. Read the comment docs.

@jbteves jbteves marked this pull request as ready for review March 15, 2021 18:09
@jbteves jbteves requested review from dowdlelt and tsalo March 15, 2021 18:09
@jbteves
Copy link
Collaborator Author

jbteves commented Mar 15, 2021

Tests are passing locally and so did the linter, so this is ready for review. I'm pretty open to making minor changes on the function names. In a future PR, I will add functionality to add the entire file manifest to dataset_description.json so that you can easily query the dataset even with prefixes.

@handwerkerd
Copy link
Member

It's crashing when I'm trying to run with --convention kundu Which is what is currently listed in the help instructions

2 voxels in user-defined mask do not have good signal. Removing voxels from mask.
Traceback (most recent call last):
  File "/opt/anaconda3/bin/tedana", line 33, in <module>
    sys.exit(load_entry_point('tedana', 'console_scripts', 'tedana')())
  File "/Users/handwerkerd/code/tedana_community/master/tedana/tedana/workflows/tedana.py", line 816, in _main
    tedana_workflow(**kwargs)
  File "/Users/handwerkerd/code/tedana_community/master/tedana/tedana/workflows/tedana.py", line 504, in tedana_workflow
    io.filewrite(masksum, 'adaptive mask', ref_img)
  File "/Users/handwerkerd/code/tedana_community/master/tedana/tedana/io.py", line 463, in filewrite
    root = gen_img_name(img_type, echo=echo)
  File "/Users/handwerkerd/code/tedana_community/master/tedana/tedana/io.py", line 68, in gen_img_name
    format_string = img_table[img_type][convention]
KeyError: 'kundu'

It looks to me like there are a few issues. The main one is that, in constants.py, the options are now allowed_conventions = ('orig', 'bidsv1.5.0') but the input options are kundu and bids I think the plan was to allow some conversation for convention names (i.e. bids & kundu conversion to specific version numbers, but I don't see where that's happening.
Also, io.py has a nice check_convention function, but I'm not sure if it's every actually called, which is why the crash isn't more descriptive.

@handwerkerd handwerkerd added enhancement issues describing possible enhancements to the project impact: medium Improves code/documentation functionality for some users output-change priority: high issues that would be really helpful if they were fixed already refactoring issues proposing/requesting changes to the code which do not impact behavior and removed refactoring issues proposing/requesting changes to the code which do not impact behavior labels Mar 22, 2021
tedana/config/img_table.json Outdated Show resolved Hide resolved
@jbteves jbteves requested a review from tsalo March 22, 2021 23:38
tedana/io.py Outdated Show resolved Hide resolved
@jbteves jbteves requested a review from tsalo March 23, 2021 18:23
@jbteves
Copy link
Collaborator Author

jbteves commented Mar 23, 2021

@tsalo the one thing that seems unresolved is API documentation, because the entire module-level docstring gets rendered on RTD. I feel like we need to retain the documentation so that things are clear to future developers (read: us 6 months from now), but obviously the rendering is unwieldy. Any thoughts on the best approach there?

Additionally, it would make the rendering much larger, but I could put the elements that are rendering on one line (mostly function names in various categories) as separate lines instead.

@handwerkerd
Copy link
Member

@tsalo, This seems fairly close to done. Do you want any changes here before merging with #691 or do you feel comfortable merging with 691 & then having a general review before merging with Main?

@tsalo
Copy link
Member

tsalo commented Mar 26, 2021

I will try to review later today and/or over the weekend. I mostly want to take a deeper look at the global variables before I approve.

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving now. We can evaluate some stuff (e.g., globals, jsons, and large docstrings) further within the main PR.

@jbteves jbteves merged commit 2b9a7ab into ME-ICA:bids-derivatives Apr 1, 2021
@jbteves jbteves deleted the modularize-io branch April 1, 2021 18:57
@jbteves jbteves added breaking change WIll make a non-trivial change to outputs and removed output-change labels Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change WIll make a non-trivial change to outputs enhancement issues describing possible enhancements to the project impact: medium Improves code/documentation functionality for some users priority: high issues that would be really helpful if they were fixed already
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants