improved docs (ME-ICA#19)

* working on selector init documentation * Breaking up outputs.rst * partially updated output_file_descriptions.rst * changed n_bold_comps to n_accepted_comps * n_bold_comps to n_accepted_comps * ComponentSelector.py API docs cleaned up * selection_nodes decision_docs updated * selection_nodes docstrings cleaned up * Fixed a test for selection_nodes * Updated faq for tedana_reclassify and tree options * docstrings in tedica and other small updates * Updated docstrings in selection_utils.py * Update docs/output_file_descriptions.rst * more doc updates * fixed meica to v2.5 in docstrings * docs building again * more updates to building decision trees Co-authored-by: Joshua Teves <jbtevespro@gmail.com>
tsalo · Dec 15, 2022 · 8c54a18 · 8c54a18
1 parent 9d6a487
commit 8c54a18
Show file tree

Hide file tree

Showing 8 changed files with 88 additions and 55 deletions.
diff --git a/docs/building decision trees.rst b/docs/building decision trees.rst
@@ -2,23 +2,36 @@
 Understanding and building a component selection process
 ########################################################
 
-``tedana`` involves transforming data into components via ICA, and then calculating metrics for each component.
-Each metric has one value per component that is stored in a comptable or component_table dataframe. This structure
-is then passed to a "decision tree" through which a series of binary choices categorize each component as **accepted** or
-**rejected**. The time series for the rejected components are regressed from the data in the final denoising step.
-
-There are several decision trees that are included by default in ``tedana`` but users can also build their own.
-This might be useful if one of the default decision trees needs to be slightly altered due to the nature
-of a specific data set, if one has an idea for a new approach to multi-echo denoising, or if one wants to integrate
+This guide is designed for users who want to better understand the mechanics
+of the component selection process and people who are considering customizing
+their own decision tree or contributing to ``tedana`` code. We have tried to
+make this accessible with minimal jargon, but it is long. If you just want to
+better understand what's in the outputs from ``tedana`` start with
+`classification output descriptions`_.
+
+``tedana`` involves transforming data into components, currently via ICA, and then
+calculating metrics for each component. Each metric has one value per component that
+is stored in a component_table dataframe. This structure is then passed to a
+"decision tree" through which a series of binary choices categorizes each component
+as **accepted** or **rejected**. The time series for the rejected components are
+regressed from the data in the `final denoising step`_.
+
+There are a couple of decision trees that are included by default in ``tedana`` but
+users can also build their own. This might be useful if one of the default decision
+trees needs to be slightly altered due to the nature of a specific data set, if one has
+an idea for a new approach to multi-echo denoising, or if one wants to integrate
 non-multi-echo metrics into a single decision tree.
 
-Note: We use two terminologies interchangeably. The whole process is called "component selection"
-and much of the code uses variants of that phrase (i.e. the ComponentSelector class, selection_nodes for the functions used in selection).
-Instructions for how to classify components is called a "decision tree" since each step in the selection
-process branches components into different intermediate or final classifications
+Note: We use two terminologies interchangeably. The whole process is called "component
+selection" and much of the code uses variants of that phrase (i.e. the ComponentSelector
+class, selection_nodes for the functions used in selection). We call the steps for how
+to classify components a "decision tree" since each step in the selection process
+branches components into different intermediate or final classifications.
 
-.. contents:: :local:
+.. _classification output descriptions: classification output descriptions.html
+.. _final denoising step: denoising.html
 
+.. contents:: :local:
 
 ******************************************
 Expected outputs after component selection
@@ -72,11 +85,12 @@ New columns in the ``component_table`` (sometimes a stand alone variable ``compt
 
 ``used_metrics``:
     Saved as a field in the  ``tree`` json file
-    A list of the metrics that were used in the decision tree. This should
-    match ``necessary_metrics`` which was a predefined list of metrics that
-    a tree uses. If these don't match, a warning should appear. These might
-    be useful for future work so that a user can input a tree and metrics
-    would be calculated based on what's needed to execute the tree.
+    A list of the metrics that were used in the decision tree. Everything in
+    ``used_metrics`` should be in either ``necessary_metrics`` or
+    ``generated_metrics`` If a used metric isn't in either, a warning message
+    will appear. These may have an additional use for future work so that a
+    user can input a tree and metrics would be calculated based on what's
+    needed to execute the tree.
 
 ``classification_tags``:
     Saved as a field in the ``tree`` json file
@@ -132,7 +146,7 @@ Defining a custom decision tree
 
 Decision trees are stored in json files. The default trees are stored as part of the tedana code repository in ./resources/decision_trees
 The minimal tree, minimal.json is a good example highlighting the structure and steps in a tree. It may be helpful
-to look at that tree while reading this section. kundu.json should replicate the decision tree used in meica version 2.7,
+to look at that tree while reading this section. kundu.json should replicate the decision tree used in MEICA version 2.5,
 the predecessor to tedana. It is a more complex, but also highlights additional possible functionality in decision trees.
 
 A user can specify another decision tree and link to the tree location when tedana is executed with the ``--tree`` option. The format is
@@ -142,9 +156,9 @@ if violated, but more will just give a warning. If you are designing or editing
 
 A decision tree can include two types of nodes or functions. All functions are currently in selection_nodes.py
 
-- A decision function will use existing metrics and potentially change the classification of the components based on those metrics. By convention, all these functions should begin with "dec"
-- A calculation function will take existing metrics and calculate a value across components to be used for classification, for example the kappa and rho elbows. By convention, all these functions should begin with "calc"
-- Nothing prevents a function from both calculating new cross component values and applying those values in a decision step, but following this convention should hopefully make decision tree specifications easier to follow and interpret.
+- A decision function will use existing metrics and potentially change the classification of the components based on those metrics. By convention, all these functions begin with "dec"
+- A calculation function will take existing metrics and calculate a value across components to be used for classification, for example the kappa and rho elbows. By convention, all these functions begin with "calc"
+- Nothing prevents a function from both calculating new cross component values and applying those values in a decision step, but following this convention should hopefully make decision tree specifications easier to follow and results easier to interpret.
 
 **Key expectations**
 
@@ -174,13 +188,19 @@ A decision tree can include two types of nodes or functions. All functions are c
 
 **Decision node json structure**
 
-There are  6 initial fields, necessary_metrics, intermediate_classification, and classification_tags, as described in the above section:
+There are 7 initial fields, necessary_metrics, intermediate_classification, and classification_tags, as described in the above section and :
 
 - "tree_id": a descriptive name for the tree that will be logged.
 - "info": A brief description of the tree for info logging
 - "report": A narrative description of the tree that could be used in report logging
 - "refs" Publications that should be referenced when this tree is used
 
+"generated_metrics" is an optional initial field. It lists metrics that are calculated as part of the decision tree.
+This is used similarly to necessary_metrics except, since the decision tree starts before these metrics exist, it
+won't raise an error when these metrics are not found. One might want to calculate a new metric if the metric uses
+only a subset of the components based on previous classifications. This does make interpretation of results more
+confusing, but, since this functionaly was part of the kundu decision tree, it is included.
+
 The "nodes" field is a list of elements where each element defines a node in the decision tree. There are several key fields for each of these nodes:
 
 - "functionname": The exact function name in selection_nodes.py that will be called.

diff --git a/docs/component_table_descriptions.rst → docs/classification_output_descriptions.rst b/docs/component_table_descriptions.rst → docs/classification_output_descriptions.rst
diff --git a/docs/faq.rst b/docs/faq.rst
@@ -47,6 +47,8 @@ Nevertheless, we have some code (thanks to Julio Peraza) that works for version
 
     <script src="https://gist.github.com/tsalo/83828e0c1e9009f3cbd82caed888afba.js"></script>
 
+.. _fMRIPrep: https://fmriprep.readthedocs.io
+
 Warping scanner-space fMRIPrep outputs to standard space
 ========================================================
 
@@ -68,12 +70,13 @@ The standard space template in this example is "MNI152NLin2009cAsym", but will d
 The TEDICA step may fail to converge if TEDPCA is either too strict
 (i.e., there are too few components) or too lenient (there are too many).
 
-In our experience, this may happen when preprocessing has not been applied to
-the data, or when improper steps have been applied to the data (e.g., distortion
-correction, rescaling, nuisance regression).
+With updates to the ``tedana`` code, this issue is now rare, but it may happen
+when preprocessing has not been applied to the data, or when improper steps have
+been applied to the data (e.g. rescaling, nuisance regression).
 If you are confident that your data have been preprocessed correctly prior to
 applying tedana, and you encounter this problem, please submit a question to `NeuroStars`_.
 
+.. _NeuroStars: https://neurostars.org
 
 .. _manual classification:
 
@@ -136,24 +139,32 @@ can include additional criteria.
 .. _make their own: building\ decision\ trees.html
 
 *************************************************************************************
-[tedana] Why isn't v3.2 of the component selection algorithm supported in ``tedana``?
+[tedana] What different versions of this method exist?
 *************************************************************************************
 
-There is a lot of solid logic behind the updated version of the TEDICA component
-selection algorithm, first added to the original ME-ICA codebase `here`_ by Dr. Prantik Kundu.
-However, we (the ``tedana`` developers) have encountered certain difficulties
-with this method (e.g., misclassified components) and the method itself has yet
-to be validated in any papers, posters, etc., which is why we have chosen to archive
-the v3.2 code, with the goal of revisiting it when ``tedana`` is more stable.
-
-Anyone interested in using v3.2 may compile and install an earlier release (<=0.0.4) of ``tedana``.
-
-
-.. _here: https://bitbucket.org/prantikk/me-ica/commits/906bd1f6db7041f88cd0efcac8a74074d673f4f5
-
-.. _NeuroStars: https://neurostars.org
-.. _fMRIPrep: https://fmriprep.readthedocs.io
-.. _afni_proc.py: https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html
+Dr. Prantik Kundu developed a multi-echo ICA (ME-ICA) denoising method and
+`shared code on bitbucket`_ to allow others to use the method. A nearly identical
+version of this code is `distributed with AFNI as MEICA v2.5 beta 11`_. Most early
+publications that validated the MEICA method used variants of this code. That code
+runs only on the now defunct python 2.7 and is not under active development. 
+``tedana`` when run with `--tree kundu --tedpca kundu` (or `--tedpca kundu-stabilize`),
+uses the same core algorithm as in MEICA v2.5. Since ICA is a nondeterministic
+algorithm and ``tedana`` and MEICA use different PCA and ICA code, the algorithm will 
+mostly be the same, but the results will not be identical.
+
+Prantik Kundu also worked on `MEICA v3.2`_ (also for python v2.7). The underlying ICA
+step is very similar, but the component selection process was different. While this
+new approach has potentialy useful ideas, the early ``tedana`` developers experienced
+non-trivial component misclassifications and there were no publications that
+validated this method. That is why ``tedana`` replicated the established and valided
+MEICA v2.5 method and also includes options to ingrate additional component selection
+methods. Recently Prantik has started to work `MEICA v3.3`_ (for python >=v3.7) so
+that this version of the selection process would again be possible to run.
+
+.. _shared code on bitbucket: https://bitbucket.org/prantikk/me-ica/src/experimental
+.. _distributed with AFNI as MEICA v2.5 beta 11: https://github.com/afni/afni/tree/master/src/pkundu
+.. _MEICA v3.2: https://github.com/ME-ICA/me-ica/tree/53191a7e8838788acf837fdf7cb3026efadf49ac
+.. _MEICA v3.3: https://github.com/ME-ICA/me-ica/tree/ME-ICA_v3.3.0
 
 
 *******************************************************************

diff --git a/docs/index.rst b/docs/index.rst
@@ -190,7 +190,7 @@ tedana is licensed under GNU Lesser General Public License version 2.1.
 
    dependence_metrics
    output_file_descriptions
-   component_table_descriptions
+   classification_output_descriptions
 
 
 ******************

diff --git a/docs/outputs.rst b/docs/outputs.rst
@@ -15,16 +15,16 @@ future processing. `descriptions of these output files are here`_.
 
 .. _descriptions of these output files are here: output_file_descriptions.html
 
-****************
-Component tables
-****************
+*******************************************
+Component tables and classification outputs
+*******************************************
 
 TEDPCA and TEDICA use component tables to track relevant metrics, component
 classifications, and rationales behind classifications.
-The component tables are stored as tsv files for BIDS-compatibility.
-`Full descriptions of these outputs are here`_.
+The component tables and additional information are stored as tsv and json files.
+`A full description of these outputs are here`_.
 
-.. _Full descriptions of these outputs are here: component_table_descriptions.html
+.. _A full description of these outputs are here: classification_output_descriptions.html
 
 
 *********************

diff --git a/tedana/selection/selection_nodes.py b/tedana/selection/selection_nodes.py
@@ -769,9 +769,9 @@ def calc_kappa_elbow(
     Note
     ----
     This function is currently hard coded for a specific way to calculate the kappa elbow
-    based on the method by Kundu in the MEICA v2.7 code. This uses the minimum of
+    based on the method by Kundu in the MEICA v2.5 code. This uses the minimum of
     a kappa elbow calculation on all components and on a subset of kappa values below
-    a significance threshold. To get the same functionality as in MEICA v2.7,
+    a significance threshold. To get the same functionality as in MEICA v2.5,
     decide_comps must be 'all'.
     """
 
@@ -881,8 +881,8 @@ def calc_rho_elbow(
     Note
     ----
     This script is currently hard coded for a specific way to calculate the rho elbow
-    based on the method by Kundu in the MEICA v2.7 code. To get the same functionality
-    in MEICA v2.7, decide_comps must be 'all' and subset_decide_comps must be
+    based on the method by Kundu in the MEICA v2.5 code. To get the same functionality
+    in MEICA v2.5, decide_comps must be 'all' and subset_decide_comps must be
     'unclassified' See :obj:`tedana.selection.selection_utils.rho_elbow_kundu_liberal`
     for a more detailed explanation of the difference between the kundu and liberal
     options.

diff --git a/tedana/selection/selection_utils.py b/tedana/selection/selection_utils.py
@@ -577,7 +577,7 @@ def getelbow(arr, return_val=False):
 def kappa_elbow_kundu(component_table, n_echos, comps2use=None):
     """
     Calculate an elbow for kappa using the approach originally in
-    Prantik Kundu's MEICA v2.7 code
+    Prantik Kundu's MEICA v2.5 code
 
     Parameters
     ----------
@@ -649,7 +649,7 @@ def rho_elbow_kundu_liberal(
 ):
     """
     Calculate an elbow for rho using the approach originally in
-    Prantik Kundu's MEICA v2.7 code and with a slightly more
+    Prantik Kundu's MEICA v2.5 code and with a slightly more
     liberal threshold
 
     Parameters

diff --git a/tedana/workflows/tedana.py b/tedana/workflows/tedana.py
@@ -144,6 +144,8 @@ def _get_parser():
             "PCA decomposition with the mdl, kic and aic options "
             "is based on a Moving Average (stationary Gaussian) "
             "process and are ordered from most to least aggressive. "
+            "'kundu' or 'kundu-stabilize' are selection methods that "
+            "were distributed with MEICA. "
             "Users may also provide a float from 0 to 1, "
             "in which case components will be selected based on the "
             "cumulative variance explained or an integer greater than 1"