Generate metrics from external regressors using F stats (#1064)

* Get required metrics from decision tree. * Continue changes. * More updates. * Store necessary_metrics as a list. * Update selection_nodes.py * Update selection_utils.py * Update across the package. * Keep updating. * Update tedana.py * Add extra metrics to list. * Update ica_reclassify.py * Draft metric-based regressor correlations. * Fix typo. * Work on trees. * Expand regular expressions in trees. * Fix up the expansion. * Really fix it though. * Fix style issue. * Added external regress integration test * Got intregration test with external regressors working * Added F tests and options * added corr_no_detrend.json * updated names and reporting * Run black. * Address style issues. * Try fixing test bugs. * Update test_component_selector.py * Update component_selector.py * Use component table directly in selectcomps2use. * Fix. * Include generated metrics in necessary metrics. * Update component_selector.py * responding to feedback from tsalo * Update component_selector.py * Update test_component_selector.py * fixed some testing failures * fixed test_check_null_succeeds * fixed ica_reclassify bug and selector_properties test * ComponentSelector initialized before loading data * fixed docstrings * updated building decision tree docs * using external regressors and most tests passing * removed corr added tasks * fit_model moved to stats * removed and cleaned up external_regressors_config option * Added task regressors and some tests. Now alll in decision tree * cleaning up decision tree json files * removed mot12_csf.json changed task to signal * fixed tests with task_keep signal * Update tedana/metrics/external.py Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu> * Update tedana/metrics/_utils.py Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu> * Update tedana/metrics/collect.py Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu> * Update tedana/metrics/external.py Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu> * Update tedana/metrics/external.py Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu> * Responding to review comments * reworded docstring * Added type hints to external.py * fixed external.py type hints * type hints to _utils collect and component_selector * type hints and doc improvements in selection_utils * no expand_node recursion * removed expand_nodes expand_node expand_dict * docstring lines break on punctuation * updating external tests and docs * moved test data downloading to tests.utils.py and started test for fit_regressors * fixed bug where task regressors retained in partial models * matched testing external regressors to included mixing and fixed bugs * Made single function for detrending regressors * added tests for external fit_regressors and fix_mixing_to_regressors * Full tests in test_external_metrics.py * adding tests * fixed extern regress validation warnings and added tests * sorting set values for test outputs * added to test_metrics * Added docs to building_decision_trees.rst * Added motion task decision tree flow chart * made recommended change to external_regressor_config * Finished documentation and renamed demo decision trees * added link to example external regressors tsv file * Apply suggestions from code review Fixed nuissance typos Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu> * Minor documentation edits --------- Co-authored-by: Taylor Salo <tsalo006@fiu.edu> Co-authored-by: Taylor Salo <salot@pennmedicine.upenn.edu> Co-authored-by: Neha Reddy <nreddy@northwestern.edu>
ME-ICA · Aug 5, 2024 · c7df469 · c7df469
1 parent f3be8f9
commit c7df469
Show file tree

Hide file tree

Showing 33 changed files with 3,200 additions and 279 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,6 +6,13 @@ docs/generated/
 .pytest_cache/
 .testing_data_cache/
 
+# For decision tree .tex flow charts do not archive intermediate files
+decision_tree*.aux
+decision_tree*.fdb_latexmk
+decision_tree*.fls
+decision_tree*.log
+decision_tree*.synctex.gz
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]

diff --git a/docs/_static/decision_tree_demo_external_regressors_motion_task_models.pdf b/docs/_static/decision_tree_demo_external_regressors_motion_task_models.pdf
diff --git a/docs/_static/decision_tree_demo_external_regressors_motion_task_models.png b/docs/_static/decision_tree_demo_external_regressors_motion_task_models.png
diff --git a/docs/_static/decision_tree_demo_external_regressors_motion_task_models.tex b/docs/_static/decision_tree_demo_external_regressors_motion_task_models.tex
@@ -0,0 +1,122 @@
+\documentclass[border=2pt]{standalone}
+\usepackage[utf8]{inputenc} % Required for inserting images
+\usepackage{tikz}
+\usepackage{helvet}
+\usetikzlibrary{shapes.geometric, arrows}
+\pagecolor{white}
+
+%-------------------------defining colorblind friendly colors
+% Using pale color scheme in Figure 6
+% by Paul Tol https://personal.sron.nl/~pault/
+\definecolor{cbblue}{HTML}{BBCCEE}
+\definecolor{cbcyan}{HTML}{CCEEFF}
+\definecolor{cbgreen}{HTML}{CCDDAA}
+\definecolor{cbyellow}{HTML}{EEEEBB}
+\definecolor{cbred}{HTML}{FFCCCC}
+\definecolor{cbgrey}{HTML}{DDDDDD}
+
+% -------------------------defining nodes
+\tikzstyle{input} = [trapezium, trapezium left angle =80, trapezium right angle = 100,
+minimum width= 3cm, minimum height=0.5cm, text centered, draw=black, fill=cbblue]
+\tikzstyle{process} = [rectangle, minimum width = 3cm, minimum height = 1cm,
+text centered, , text width=4cm,draw=black, fill=cbgrey]
+\tikzstyle{decision} = [diamond, minimum width = 3cm, minimum height = 1cm,
+text centered, , text width=3cm, draw=black, fill=cbcyan]
+\tikzstyle{changeclass} = [rectangle, rounded corners, minimum width=3cm, minimum height=1cm,
+text centered, draw = black, fill=cbyellow]
+\tikzstyle{reject} = [trapezium, trapezium left angle =80, trapezium right angle = 100,
+minimum width= 1cm, minimum height=0.5cm, text centered, draw=black, fill=cbred]
+\tikzstyle{accept} = [trapezium, trapezium left angle =80, trapezium right angle = 100,
+minimum width= 1cm, minimum height=0.5cm, text centered, draw=black, fill=cbgreen]
+
+% -------------------------defining connectors
+\tikzstyle{arrow} = [thick,->, >=stealth]
+\tikzstyle{line} = [thick,-,>=stealth]
+\begin{document}
+
+% ------------------------- tikz image (flow chart)
+\begin{tikzpicture}[node distance = 2cm]
+
+% ------------------------- nodes -------------------------
+% ----- node: 0
+\node(0)[input,label={90:\textbf{Demo Decision Tree with External Regressors, Motion, and Task Models}}, label={180:$node\ 0$}]{Set all components to unclassified};
+% ----- node: 1
+\node(1)[decision, below of=0,label={180:$node\ 1$}, yshift=-1.5cm]{$\rho$ $>$ $\kappa$};
+\node(rej1)[changeclass, right of=1, xshift=3cm, align=center]{Unlikely BOLD\\$\rightarrow$ Provisional reject};
+% ----- node: 2
+\node(2)[decision, below of=1,label={180:$node\ 2$}, label={[align=center] 315: voxel counts for signif fit\\of multi-echo data\\to $T_2$ or $S_0$ decay models}, yshift=-4.0cm]{$n \, FS_0 \, > \, n \, FT_2$ \& $n \,FT_2$ $>$ 0};
+\node(rej2)[changeclass, right of=2, xshift=3cm, align=center]{Unlikely BOLD\\$\rightarrow$ Provisional Reject};
+% ----- node: 3
+\node(3)[process, below of=2, label={180:$node\ 3$}, label={[align=center] 315: varex: variance explained\\by each component}, yshift=-2.0cm]{Calculate median(varex) across all components};
+% ----- node: 4
+\node(4)[decision, below of=3,label={180:$node\ 4$},label={[align=center] 315:DICE overlap between $T_2$ or $S_0$\\decay models and ICA component\\peak clusters}, yshift=-1.5cm]{dice $FS_0$ $>$ dice $FT_2$ \& varex $>$ median(varex)
+};
+\node(rej4)[changeclass, right of=4, xshift=3cm, align=center]{Unlikely BOLD\\$\rightarrow$ Provisional Reject};
+% ----- node: 5
+\node(5)[decision, below of=4,label={180:$node\ 5$}, label={[align=center] 315: $t-statistic$ of $FT_2$ values\\in component peak clusters vs\\peak voxels outside of clusters}, yshift=-4.0cm]{ $0 \, >$ signal-noise \&  varex $>$ median(varex)};
+\node(rej5)[changeclass, right of=5, xshift=3cm, align=center]{Unlikely BOLD\\$\rightarrow$ Provisional Reject};
+% ----- node: 6
+\node(6)[process, below of=5, label={180:$node\ 6$}, label={0: Uses all components}, yshift=-2.0cm]{Calculate $\kappa$ elbow};
+% ----- node: 7
+\node(7)[process, below of=6, label={180:$node\ 7$}, label={[align=center] 0: Uses all components and subset\\of unclassified components}]{Calculate $\rho$ elbow\\(liberal method)};
+% ----- node: 8
+\node(8)[decision, below of=7,label={180:$node\ 8$}, yshift=-1.5cm]{$\kappa \geq \kappa$ elbow\\$\rho$ $<$ $\rho$ elbow};
+\node(chrej8)[changeclass, below of=8, xshift=0cm, yshift=-2cm]{Provisional reject};
+\node(chacc8)[changeclass, right of=8, xshift=3cm, yshift=0cm]{Provisional accept};
+% ----- node: 9
+\node(9)[decision, below of=chrej8,label={180:$node\ 9$},label={20: Accept even if $\rho < \rho\ elbow$},yshift=-1.5cm]{$\kappa > 2\rho$\\$\kappa \geq \kappa$ elbow};
+\node(chrej9)[changeclass, below of=9, xshift=0cm, yshift=-2cm]{Provisional reject};
+\node(chacc9)[changeclass, right of=9, xshift=3cm, yshift=0cm]{Provisional accept};
+% ----- node: 10
+\node(10)[decision, below of=chacc9,label={150:$node\ 10$},label={[align=center] 310: Reject if\\fits external\\nuisance\\regressors},yshift=-2cm]{F test for\\Nuisance Regressors\\$p_{Full} \leq 0.05$\\$R^2_{Full} \geq 0.5$};
+\node(chrej10)[changeclass, below of=10, xshift=0cm, yshift=-2cm, align=center]{External regressors\\$\rightarrow$Provisional reject};
+% ----- node: 11
+\node(11)[decision, left of=chrej10,label={180:$node\ 11$},xshift=-3cm]{Partial F test for\\Motion Regressors\\$p_{Full} \leq 0.05$\\$R^2_{Full} \geq 0.5$\\$p_{Motion} \leq 0.05$};
+\node(chtag11)[changeclass, below of=11, xshift=0cm, yshift=-2cm, align=center]{Tag:\\Fits motion\\external regressors};
+% ----- node: 12
+\node(12)[decision, below of=chrej10,label={150:$node\ 12$},yshift=-2cm]{Partial F test for\\CSF Regressors\\$p_{Full} \leq 0.05$\\$R^2_{Full} \geq 0.5$\\$p_{CSF} \leq 0.05$};
+\node(chtag12)[changeclass, below of=12, xshift=0cm, yshift=-2cm, align=center]{Tag:\\Fits CSF\\external regressors};
+% ----- node: 13
+\node(prej13)[changeclass, below of=chtag11, xshift=0cm, yshift=-0.5cm]{Provisional reject};
+\node(13)[decision, below of=prej13,label={180:$node\ 13$},label={[align=center] 335: If fits task and\\contains T2*, accept\\even if other criteria\\would have rejected},yshift=-2cm]{F test for\\Task Regressors\\$p_{Task} \leq 0.05$\\$R^2_{Task} \geq 0.5$\\$\kappa \geq \kappa$ elbow};
+\node(chacc13)[accept, right of=13,xshift=3cm, align=center]{Fits task\\$\rightarrow$Accept};
+% ----- node: 14
+\node(14)[decision, below of=13,label={180:$node\ 14$},label={[align=left] 335: Will accept the lowest\\variance components until\\1\% of total variance is\\accepted this way}, yshift=-3.5cm]{$if$ component variance $<0.1$};%--check in kundu
+\node(acc14)[accept, right of=14, xshift=2.5cm, align=center]{Low variance\\$\rightarrow$ Accept};
+% ----- node: 15
+\node(15)[accept, below of=14,label={180:$node\ 15$},yshift=-1.5cm, align=center]{Likely BOLD\\Change provisional accept\\$\rightarrow$Accept};
+% ----- node: 16
+\node(16)[reject, below of=15,label={180:$node\ 16$}, yshift=0cm, align=center]{Unlikely BOLD\\Change provisional reject\\$\rightarrow$Reject};
+
+% ------------------------- connections -------------------------
+% draw[x](origin)--node[anchor=position]{text}(destination);
+\draw[arrow](0)--(1);
+\draw[arrow](1)--node[anchor=south, right=0] {no} (2);
+\draw[arrow](1)--node[anchor=south] {yes} (rej1);
+\draw[arrow](2)--node[anchor=south, right=0] {no} (3);
+\draw[arrow](2)--node[anchor=south] {yes} (rej2);
+\draw[arrow](3)--(4);
+\draw[arrow](4)--node[anchor=south, right=0] {no} (5);
+\draw[arrow](4)--node[anchor=south] {yes} (rej4);
+\draw[arrow](5)--node[anchor=south, right=0] {no} (6);
+\draw[arrow](5)--node[anchor=south] {yes} (rej5);
+\draw[arrow](6)--(7);
+\draw[arrow](7)--(8);
+\draw[arrow](8)--node[anchor=south] {yes} (chacc8);
+\draw[arrow](8)--node[anchor=south, right=0] {no} (chrej8);
+\draw[arrow](chrej8)--(9);
+\draw[arrow](9)--node[anchor=south] {yes} (chacc9);
+\draw[arrow](9)--node[anchor=south, right=0] {no} (chrej9);
+\draw[arrow](chacc9)--(10);
+\draw[arrow](chrej9)--(10);
+\draw[arrow](10)--node[anchor=south, right=0] {yes} (chrej10);
+\draw[arrow](chrej10)--(11);
+\draw[arrow](11)--node[anchor=south, right=0] {yes} (chtag11);
+\draw[arrow](chrej10)--(12);
+\draw[arrow](12)--node[anchor=south, right=0] {yes} (chtag12);
+\draw[arrow](prej13)--(13);
+\draw[arrow](13)--node[anchor=south] {yes} (chacc13);
+\draw[arrow](13)--node[anchor=south, right=0] {no} (14);
+\draw[arrow](14)--node[anchor=south] {yes} (acc14);
+\end{tikzpicture}
+\end{document}
diff --git a/docs/_static/decision_tree_demo_external_regressors_single_model.pdf b/docs/_static/decision_tree_demo_external_regressors_single_model.pdf
diff --git a/docs/_static/decision_tree_demo_external_regressors_single_model.png b/docs/_static/decision_tree_demo_external_regressors_single_model.png
diff --git a/docs/_static/decision_tree_demo_external_regressors_single_model.tex b/docs/_static/decision_tree_demo_external_regressors_single_model.tex
@@ -0,0 +1,109 @@
+\documentclass[border=2pt]{standalone}
+\usepackage[utf8]{inputenc} % Required for inserting images
+\usepackage{tikz}
+\usepackage{helvet}
+\usetikzlibrary{shapes.geometric, arrows}
+\pagecolor{white}
+
+%-------------------------defining colorblind friendly colors
+% Using pale color scheme in Figure 6
+% by Paul Tol https://personal.sron.nl/~pault/
+\definecolor{cbblue}{HTML}{BBCCEE}
+\definecolor{cbcyan}{HTML}{CCEEFF}
+\definecolor{cbgreen}{HTML}{CCDDAA}
+\definecolor{cbyellow}{HTML}{EEEEBB}
+\definecolor{cbred}{HTML}{FFCCCC}
+\definecolor{cbgrey}{HTML}{DDDDDD}
+
+% -------------------------defining nodes
+\tikzstyle{input} = [trapezium, trapezium left angle =80, trapezium right angle = 100,
+minimum width= 3cm, minimum height=0.5cm, text centered, draw=black, fill=cbblue]
+\tikzstyle{process} = [rectangle, minimum width = 3cm, minimum height = 1cm,
+text centered, , text width=4cm,draw=black, fill=cbgrey]
+\tikzstyle{decision} = [diamond, minimum width = 3cm, minimum height = 1cm,
+text centered, , text width=3cm, draw=black, fill=cbcyan]
+\tikzstyle{changeclass} = [rectangle, rounded corners, minimum width=3cm, minimum height=1cm,
+text centered, draw = black, fill=cbyellow]
+\tikzstyle{reject} = [trapezium, trapezium left angle =80, trapezium right angle = 100,
+minimum width= 1cm, minimum height=0.5cm, text centered, draw=black, fill=cbred]
+\tikzstyle{accept} = [trapezium, trapezium left angle =80, trapezium right angle = 100,
+minimum width= 1cm, minimum height=0.5cm, text centered, draw=black, fill=cbgreen]
+
+% -------------------------defining connectors
+\tikzstyle{arrow} = [thick,->, >=stealth]
+\tikzstyle{line} = [thick,-,>=stealth]
+\begin{document}
+
+% ------------------------- tikz image (flow chart)
+\begin{tikzpicture}[node distance = 2cm]
+
+% ------------------------- nodes -------------------------
+% ----- node: 0
+\node(0)[input,label={90:\textbf{Demo Decision Tree. Single model with external regressors}}, label={180:$node\ 0$}]{Set all components to unclassified};
+% ----- node: 1
+\node(1)[decision, below of=0,label={180:$node\ 1$}, yshift=-1cm]{$\rho$ $>$ $\kappa$};
+\node(rej1)[reject, right of=1, xshift=2cm, align=center]{Unlikely BOLD\\$\rightarrow$ Reject};
+% ----- node: 2
+\node(2)[decision, below of=1,label={180:$node\ 2$}, label={[align=center] 315: voxel counts for signif fit\\of multi-echo data\\to $T_2$ or $S_0$ decay models}, yshift=-3.0cm]{$n \, FS_0 \, > \, n \, FT_2$ \& $n \,FT_2$ $>$ 0};
+\node(rej2)[reject, right of=2, xshift=2cm, align=center]{Unlikely BOLD\\$\rightarrow$ Reject};
+% ----- node: 3
+\node(3)[process, below of=2, label={180:$node\ 3$}, label={[align=center] 315: varex: variance explained\\by each component}, yshift=-1.5cm]{Calculate median(varex) across all components};
+% ----- node: 4
+\node(4)[decision, below of=3,label={180:$node\ 4$},label={[align=center] 315:DICE overlap between $T_2$ or $S_0$\\decay models and ICA component\\peak clusters}, yshift=-1.5cm]{dice $FS_0$ $>$ dice $FT_2$ \& varex $>$ median(varex)
+};
+\node(rej4)[reject, right of=4, xshift=2.5cm, align=center]{Unlikely BOLD\\$\rightarrow$ Reject};
+% ----- node: 5
+\node(5)[decision, below of=4,label={180:$node\ 5$}, label={[align=center] 315: $t-statistic$ of $FT_2$ values\\in component peak clusters vs\\peak voxels outside of clusters}, yshift=-4.0cm]{ $0 \, >$ signal-noise \&  varex $>$ median(varex)};
+\node(rej5)[reject, right of=5, xshift=2.5cm, align=center]{Unlikely BOLD\\$\rightarrow$ Reject};
+% ----- node: 6
+\node(6)[process, below of=5, label={180:$node\ 6$}, label={0: Uses all components}, yshift=-2.0cm]{Calculate $\kappa$ elbow};
+% ----- node: 7
+\node(7)[process, below of=6, label={180:$node\ 7$}, label={[align=center] 0: Uses all components and subset\\of unclassified components}]{Calculate $\rho$ elbow\\(liberal method)};
+% ----- node: 7
+\node(8)[decision, below of=7,label={180:$node\ 8$}, yshift=-1.5cm]{$\kappa \geq \kappa$ elbow};
+\node(chrej8)[changeclass, below of=8, yshift=-1.5cm]{Provisional reject};
+\node(chacc8)[changeclass, right of=8, xshift=3cm, yshift=0cm]{Provisional accept};
+% ----- node: 8
+\node(9)[decision, below of=chacc8,label={170:$node\ 9$}, yshift=-1.5cm]{ $\rho$ $>$ $\rho$ elbow};
+\node(chrej9)[changeclass, below of=9, yshift=-1.5cm]{Provisional reject};
+% ----- node: 9
+\node(10)[decision, left of=chrej9,label={180:$node\ 10$},label={235: Accept even if $\rho < \rho\ elbow$},xshift=-3.5cm]{$\kappa \geq \kappa$ elbow\\$\kappa > 2\rho$  };
+\node(chacc10)[changeclass, below of=10, xshift=0cm, yshift=-1.5cm, align=center]{Provisional accept};
+% ----- node: 10
+\node(11)[decision, below of=chacc10,label={180:$node\ 11$}, xshift=0cm, yshift=-2cm]{External regressor\\nuisance model\\$p<0.05$\\$R^2>0.5$};%--check in kundu
+\node(chrej11)[changeclass, below of=11, xshift=0cm, yshift=-1.5cm, align=center]{Tag: External Regressors\\Provisional reject};
+% ----- node: 11
+\node(12)[decision, below of=chrej11,label={180:$node\ 11$},label={[align=left] 335: Will accept the lowest\\variance components until\\1\% of total variance is\\accepted this way}, yshift=-1.5cm]{$if$ component variance $<0.1$};%--check in kundu
+\node(acc12)[accept, right of=12, xshift=3cm, align=center]{Low variance\\$\rightarrow$ Accept};
+% ----- node: 12
+\node(13)[accept, below of=12,label={180:$node\ 12$},yshift=-1.5cm, align=center]{Likely BOLD\\Change provisional accept\\$\rightarrow$Accept};
+% ----- node: 13
+\node(14)[reject, below of=13,label={180:$node\ 13$}, yshift=0cm, align=center]{Unlikely BOLD\\Change provisional reject\\$\rightarrow$Reject};
+
+
+% ------------------------- connections -------------------------
+\draw[arrow](0)--(1);
+\draw[arrow](1)--node[anchor=south, right=0] {no} (2);
+\draw[arrow](1)--node[anchor=south] {yes} (rej1);
+\draw[arrow](2)--node[anchor=south, right=0] {no} (3);
+\draw[arrow](2)--node[anchor=south] {yes} (rej2);
+\draw[arrow](3)--(4);
+\draw[arrow](4)--node[anchor=south, right=0] {no} (5);
+\draw[arrow](4)--node[anchor=south] {yes} (rej4);
+\draw[arrow](5)--node[anchor=south, right=0] {no} (6);
+\draw[arrow](5)--node[anchor=south] {yes} (rej5);
+\draw[arrow](6)--(7);
+\draw[arrow](7)--(8);
+\draw[arrow](8)--node[anchor=south] {yes} (chacc8);
+\draw[arrow](8)--node[anchor=south, right=0] {no} (chrej8);
+\draw[arrow](chacc8)--(9);
+\draw[arrow](chrej8)--(9);
+\draw[arrow](9)--node[anchor=south, right=0] {yes} (chrej9);
+\draw[arrow](chrej9)--(10);
+\draw[arrow](10)--node[anchor=south, right=0] {yes} (chacc10);
+\draw[arrow](chacc10)--(11);
+\draw[arrow](11)--node[anchor=south, right=0] {yes} (chrej11);
+\draw[arrow](chrej11)--(12);
+\draw[arrow](12)--node[anchor=south] {yes} (acc12);
+\end{tikzpicture}
+\end{document}
diff --git a/docs/api.rst b/docs/api.rst
@@ -105,6 +105,7 @@ API
 
    tedana.metrics.collect
    tedana.metrics.dependence
+   tedana.metrics.external
 
 
 .. _api_selection_ref: