Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print optimal number of maPCA components and plot optimization curves #839

Merged
merged 17 commits into from
May 17, 2022

Conversation

eurunuela
Copy link
Collaborator

@eurunuela eurunuela commented Feb 8, 2022

Closes #834 .

Changes proposed in this pull request:

  • Gets the optimal number of components and optimization curves for all three criteria in maPCA.
  • Plots the optimization curves of all three criteria in maPCA.
  • Prints the optimal number of components given by all three criteria to the logs.
  • Print and plot optimal number of components given by 90% and 95% of variance explained when maPCA returns enough components.

Edit: this draft PR will fail until #47 and #48 are merged in maPCA.

@eurunuela
Copy link
Collaborator Author

@handwerkerd I don't know what you want the PCA_cross_component_metrics file to contain exactly, so I am willing to leave that to you. What do you think of the changes I've made so far in this PR?

@handwerkerd
Copy link
Member

I was hoping to read this carefully today, but it's not going to happen. I'll try to give a closer read soon. When I was discussing, PCA_cross_component_metrics it was in reference to cross_component_metrics that I've added to the component selection class in #756 Here is one example where it is used:

outputs["kappa_elbow_kundu"] = kappa_elbow_kundu(
component_table, selector.n_echos
)
selector.cross_component_metrics["kappa_elbow_kundu"] = outputs[
"kappa_elbow_kundu"
]

The basic idea is to have a dictionary for all values that are calculated based on metrics across components (i.e. the kappa and rho elbows). The complements the component table, where each metric has a value for each component. In the current code, these cross component values are either in the log and nowhere else or not saved at all. Since you're criterion thresholds are the same general concept, I figured you could create a similar dictionary with a similar naming style.

@codecov
Copy link

codecov bot commented Feb 22, 2022

Codecov Report

Merging #839 (52ffad8) into main (d4406e4) will increase coverage by 0.16%.
The diff coverage is 98.64%.

@@            Coverage Diff             @@
##             main     #839      +/-   ##
==========================================
+ Coverage   93.15%   93.31%   +0.16%     
==========================================
  Files          27       27              
  Lines        2234     2304      +70     
==========================================
+ Hits         2081     2150      +69     
- Misses        153      154       +1     
Impacted Files Coverage Δ
tedana/io.py 94.03% <75.00%> (-0.36%) ⬇️
tedana/decomposition/pca.py 90.16% <100.00%> (+3.64%) ⬆️
tedana/reporting/__init__.py 100.00% <100.00%> (ø)
tedana/reporting/static_figures.py 98.79% <100.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4406e4...52ffad8. Read the comment docs.

@eurunuela
Copy link
Collaborator Author

The three-echo test should pass once we merge PR #50 in maPCA.

@eurunuela eurunuela marked this pull request as ready for review March 16, 2022 10:53
@eurunuela
Copy link
Collaborator Author

The test will pass once PR #50 is merged in maPCA.

@handwerkerd
Copy link
Member

I just tried to locally run the three-echo test with mapca after #50 was merged. The test is still failing in io.py with

Object of type ndarray is not JSON serializable
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/io.py]()", line 253, in save_json
    json.dump(data, fo, indent=4, sort_keys=True)
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/io.py]()", line 201, in save_file
    self.save_json(data, name)
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/decomposition/pca.py]()", line 322, in tedpca
    io_generator.save_file(mapca_results, "PCA cross component metrics json")
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/workflows/tedana.py]()", line 638, in tedana_workflow
    dd, n_components = decomposition.tedpca(
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/tests/test_integration.py]()", line 181, in test_integration_three_echo (Current frame)
    tedana_cli.tedana_workflow(

The four-echo test is passing so this isn't a universal issue. I'm not sure I have time to dig into this more right now, but wanted to share.

@tsalo
Copy link
Member

tsalo commented Mar 16, 2022

I think we need to automatically convert numpy arrays to lists in save_json.

voxel_comp_weights = ma_pca.u_
varex = ma_pca.explained_variance_
varex_norm = ma_pca.explained_variance_ratio_
comp_ts = ma_pca.components_.T
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like we're accessing private components of the object. Could we either link to some documentation about the returned object or elaborate on what we're accessing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing underscore indicates that the attribute is estimated from the data (via fit), rather than that it's private. We adopted this convention from scikit-learn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. Sorry had a momentary flip in my brain, where I put the underscore on the wrong side. I do think it's worth a quick reference to the docs.

tedana/info.py Outdated
@@ -29,7 +29,7 @@

REQUIRES = [
"bokeh<2.3.0",
"mapca~=0.0.1",
"mapca>=0.0.2",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we'll need to resolve the conflict with main where we've moved everything into a different setup organization (in particular, setup.cfg in the install_requires section).

@@ -288,3 +288,144 @@ def comp_figures(ts, mask, comptable, mmix, io_generator, png_cmap):
compplot_name = os.path.join(io_generator.out_dir, "figures", plot_name)
plt.savefig(compplot_name)
plt.close()


def pca_results(criteria, n_components, all_varex, io_generator):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a large amount of repetition. Could we break it into smaller functions, parametrized perhaps by something like the input data and the label only, and if matplotlib requires this, the figure itself?

@jbteves
Copy link
Collaborator

jbteves commented May 9, 2022

Hi Eneko, I left some comments but didn't formally review. Let me know if you'd like for me to fix the merge conflicts and some of the refactors I mentioned as possibilities for you. However I do think it's worth considering some more documentation about how we're extracting these values from the mapca object.

@eurunuela
Copy link
Collaborator Author

Hi Eneko, I left some comments but didn't formally review. Let me know if you'd like for me to fix the merge conflicts and some of the refactors I mentioned as possibilities for you. However I do think it's worth considering some more documentation about how we're extracting these values from the mapca object.

Feel free to make the changes. I won't be able to work on this until I come back from ISMRM.

Thank you for the comments by the way!

Resolves conflicts:
    - info.py: deletes and moves dependency version to setup.cfg
    - cornell_three_echo_outputs: adds pca_criteria.png, keeps main
      outputs
@jbteves
Copy link
Collaborator

jbteves commented May 11, 2022

Hm, I merged the changes but it seems like some expected attribute is missing. It looks like CircleCI is just caching the last environment, so that it's not checking the versions in setup.cfg to update them appropriately, or perhaps some other error I'm missing. Any thoughts @eurunuela @tsalo ?

@jbteves
Copy link
Collaborator

jbteves commented May 13, 2022

@eurunuela I think you need to cut a new release with the changes so that we can pin that release.

@eurunuela
Copy link
Collaborator Author

@eurunuela I think you need to cut a new release with the changes so that we can pin that release.

Done!

jbteves
jbteves previously approved these changes May 16, 2022
Copy link
Collaborator

@jbteves jbteves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @eurunuela !

@handwerkerd handwerkerd self-requested a review May 16, 2022 19:21
Copy link
Member

@handwerkerd handwerkerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM.
The one thing I notices is that a file called ./figures/pca_variance_explained.png It looks similar to the relevant file ./figures/pca_criteria.png but it just contains the vertical dashed lines without the criteria curves. My guess is that this was useful while working on the code, but isn't useful as a saved output. If I'm right, just don't save that file?

@eurunuela
Copy link
Collaborator Author

Mostly LGTM. The one thing I notices is that a file called ./figures/pca_variance_explained.png It looks similar to the relevant file ./figures/pca_criteria.png but it just contains the vertical dashed lines without the criteria curves. My guess is that this was useful while working on the code, but isn't useful as a saved output. If I'm right, just don't save that file?

I'm sorry @handwerkerd, that was a mistake on my end. Here's what the figure should look like:
pca_variance_explained

I'm committing the fix now.

@handwerkerd
Copy link
Member

Yay! That does look useful and I'm glad I noticed. In your sample figure above, I don't see a blue dashed line for AIC.

@eurunuela
Copy link
Collaborator Author

Yay! That does look useful and I'm glad I noticed. In your sample figure above, I don't see a blue dashed line for AIC.

That's because it's right behind the KIC line. The figure was generated with the 3 echo integration test by the way.

Copy link
Member

@handwerkerd handwerkerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@jbteves jbteves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurunuela eurunuela merged commit fc61dec into ME-ICA:main May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make MA-PCA criteria curves more accessible to users
4 participants