Skip to content

Commit

Permalink
improved docs (ME-ICA#19)
Browse files Browse the repository at this point in the history
* working on selector init documentation

* Breaking up outputs.rst

* partially updated output_file_descriptions.rst

* changed n_bold_comps to n_accepted_comps

* n_bold_comps to n_accepted_comps

* ComponentSelector.py API docs cleaned up

* selection_nodes decision_docs updated

* selection_nodes docstrings cleaned up

* Fixed a test for selection_nodes

* Updated faq for tedana_reclassify and tree options

* docstrings in tedica and other small updates

* Updated docstrings in selection_utils.py

* Update docs/output_file_descriptions.rst

* more doc updates

* fixed meica to v2.5 in docstrings

* docs building again

* more updates to building decision trees

Co-authored-by: Joshua Teves <jbtevespro@gmail.com>
  • Loading branch information
handwerkerd and jbteves authored Dec 15, 2022
1 parent 9d6a487 commit 8c54a18
Show file tree
Hide file tree
Showing 8 changed files with 88 additions and 55 deletions.
66 changes: 43 additions & 23 deletions docs/building decision trees.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,36 @@
Understanding and building a component selection process
########################################################

``tedana`` involves transforming data into components via ICA, and then calculating metrics for each component.
Each metric has one value per component that is stored in a comptable or component_table dataframe. This structure
is then passed to a "decision tree" through which a series of binary choices categorize each component as **accepted** or
**rejected**. The time series for the rejected components are regressed from the data in the final denoising step.

There are several decision trees that are included by default in ``tedana`` but users can also build their own.
This might be useful if one of the default decision trees needs to be slightly altered due to the nature
of a specific data set, if one has an idea for a new approach to multi-echo denoising, or if one wants to integrate
This guide is designed for users who want to better understand the mechanics
of the component selection process and people who are considering customizing
their own decision tree or contributing to ``tedana`` code. We have tried to
make this accessible with minimal jargon, but it is long. If you just want to
better understand what's in the outputs from ``tedana`` start with
`classification output descriptions`_.

``tedana`` involves transforming data into components, currently via ICA, and then
calculating metrics for each component. Each metric has one value per component that
is stored in a component_table dataframe. This structure is then passed to a
"decision tree" through which a series of binary choices categorizes each component
as **accepted** or **rejected**. The time series for the rejected components are
regressed from the data in the `final denoising step`_.

There are a couple of decision trees that are included by default in ``tedana`` but
users can also build their own. This might be useful if one of the default decision
trees needs to be slightly altered due to the nature of a specific data set, if one has
an idea for a new approach to multi-echo denoising, or if one wants to integrate
non-multi-echo metrics into a single decision tree.

Note: We use two terminologies interchangeably. The whole process is called "component selection"
and much of the code uses variants of that phrase (i.e. the ComponentSelector class, selection_nodes for the functions used in selection).
Instructions for how to classify components is called a "decision tree" since each step in the selection
process branches components into different intermediate or final classifications
Note: We use two terminologies interchangeably. The whole process is called "component
selection" and much of the code uses variants of that phrase (i.e. the ComponentSelector
class, selection_nodes for the functions used in selection). We call the steps for how
to classify components a "decision tree" since each step in the selection process
branches components into different intermediate or final classifications.

.. contents:: :local:
.. _classification output descriptions: classification output descriptions.html
.. _final denoising step: denoising.html

.. contents:: :local:

******************************************
Expected outputs after component selection
Expand Down Expand Up @@ -72,11 +85,12 @@ New columns in the ``component_table`` (sometimes a stand alone variable ``compt

``used_metrics``:
Saved as a field in the ``tree`` json file
A list of the metrics that were used in the decision tree. This should
match ``necessary_metrics`` which was a predefined list of metrics that
a tree uses. If these don't match, a warning should appear. These might
be useful for future work so that a user can input a tree and metrics
would be calculated based on what's needed to execute the tree.
A list of the metrics that were used in the decision tree. Everything in
``used_metrics`` should be in either ``necessary_metrics`` or
``generated_metrics`` If a used metric isn't in either, a warning message
will appear. These may have an additional use for future work so that a
user can input a tree and metrics would be calculated based on what's
needed to execute the tree.

``classification_tags``:
Saved as a field in the ``tree`` json file
Expand Down Expand Up @@ -132,7 +146,7 @@ Defining a custom decision tree

Decision trees are stored in json files. The default trees are stored as part of the tedana code repository in ./resources/decision_trees
The minimal tree, minimal.json is a good example highlighting the structure and steps in a tree. It may be helpful
to look at that tree while reading this section. kundu.json should replicate the decision tree used in meica version 2.7,
to look at that tree while reading this section. kundu.json should replicate the decision tree used in MEICA version 2.5,
the predecessor to tedana. It is a more complex, but also highlights additional possible functionality in decision trees.

A user can specify another decision tree and link to the tree location when tedana is executed with the ``--tree`` option. The format is
Expand All @@ -142,9 +156,9 @@ if violated, but more will just give a warning. If you are designing or editing

A decision tree can include two types of nodes or functions. All functions are currently in selection_nodes.py

- A decision function will use existing metrics and potentially change the classification of the components based on those metrics. By convention, all these functions should begin with "dec"
- A calculation function will take existing metrics and calculate a value across components to be used for classification, for example the kappa and rho elbows. By convention, all these functions should begin with "calc"
- Nothing prevents a function from both calculating new cross component values and applying those values in a decision step, but following this convention should hopefully make decision tree specifications easier to follow and interpret.
- A decision function will use existing metrics and potentially change the classification of the components based on those metrics. By convention, all these functions begin with "dec"
- A calculation function will take existing metrics and calculate a value across components to be used for classification, for example the kappa and rho elbows. By convention, all these functions begin with "calc"
- Nothing prevents a function from both calculating new cross component values and applying those values in a decision step, but following this convention should hopefully make decision tree specifications easier to follow and results easier to interpret.

**Key expectations**

Expand Down Expand Up @@ -174,13 +188,19 @@ A decision tree can include two types of nodes or functions. All functions are c

**Decision node json structure**

There are 6 initial fields, necessary_metrics, intermediate_classification, and classification_tags, as described in the above section:
There are 7 initial fields, necessary_metrics, intermediate_classification, and classification_tags, as described in the above section and :

- "tree_id": a descriptive name for the tree that will be logged.
- "info": A brief description of the tree for info logging
- "report": A narrative description of the tree that could be used in report logging
- "refs" Publications that should be referenced when this tree is used

"generated_metrics" is an optional initial field. It lists metrics that are calculated as part of the decision tree.
This is used similarly to necessary_metrics except, since the decision tree starts before these metrics exist, it
won't raise an error when these metrics are not found. One might want to calculate a new metric if the metric uses
only a subset of the components based on previous classifications. This does make interpretation of results more
confusing, but, since this functionaly was part of the kundu decision tree, it is included.

The "nodes" field is a list of elements where each element defines a node in the decision tree. There are several key fields for each of these nodes:

- "functionname": The exact function name in selection_nodes.py that will be called.
Expand Down
File renamed without changes.
49 changes: 30 additions & 19 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ Nevertheless, we have some code (thanks to Julio Peraza) that works for version

<script src="https://gist.github.com/tsalo/83828e0c1e9009f3cbd82caed888afba.js"></script>

.. _fMRIPrep: https://fmriprep.readthedocs.io

Warping scanner-space fMRIPrep outputs to standard space
========================================================

Expand All @@ -68,12 +70,13 @@ The standard space template in this example is "MNI152NLin2009cAsym", but will d
The TEDICA step may fail to converge if TEDPCA is either too strict
(i.e., there are too few components) or too lenient (there are too many).

In our experience, this may happen when preprocessing has not been applied to
the data, or when improper steps have been applied to the data (e.g., distortion
correction, rescaling, nuisance regression).
With updates to the ``tedana`` code, this issue is now rare, but it may happen
when preprocessing has not been applied to the data, or when improper steps have
been applied to the data (e.g. rescaling, nuisance regression).
If you are confident that your data have been preprocessed correctly prior to
applying tedana, and you encounter this problem, please submit a question to `NeuroStars`_.

.. _NeuroStars: https://neurostars.org

.. _manual classification:

Expand Down Expand Up @@ -136,24 +139,32 @@ can include additional criteria.
.. _make their own: building\ decision\ trees.html

*************************************************************************************
[tedana] Why isn't v3.2 of the component selection algorithm supported in ``tedana``?
[tedana] What different versions of this method exist?
*************************************************************************************

There is a lot of solid logic behind the updated version of the TEDICA component
selection algorithm, first added to the original ME-ICA codebase `here`_ by Dr. Prantik Kundu.
However, we (the ``tedana`` developers) have encountered certain difficulties
with this method (e.g., misclassified components) and the method itself has yet
to be validated in any papers, posters, etc., which is why we have chosen to archive
the v3.2 code, with the goal of revisiting it when ``tedana`` is more stable.

Anyone interested in using v3.2 may compile and install an earlier release (<=0.0.4) of ``tedana``.


.. _here: https://bitbucket.org/prantikk/me-ica/commits/906bd1f6db7041f88cd0efcac8a74074d673f4f5

.. _NeuroStars: https://neurostars.org
.. _fMRIPrep: https://fmriprep.readthedocs.io
.. _afni_proc.py: https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html
Dr. Prantik Kundu developed a multi-echo ICA (ME-ICA) denoising method and
`shared code on bitbucket`_ to allow others to use the method. A nearly identical
version of this code is `distributed with AFNI as MEICA v2.5 beta 11`_. Most early
publications that validated the MEICA method used variants of this code. That code
runs only on the now defunct python 2.7 and is not under active development.
``tedana`` when run with `--tree kundu --tedpca kundu` (or `--tedpca kundu-stabilize`),
uses the same core algorithm as in MEICA v2.5. Since ICA is a nondeterministic
algorithm and ``tedana`` and MEICA use different PCA and ICA code, the algorithm will
mostly be the same, but the results will not be identical.

Prantik Kundu also worked on `MEICA v3.2`_ (also for python v2.7). The underlying ICA
step is very similar, but the component selection process was different. While this
new approach has potentialy useful ideas, the early ``tedana`` developers experienced
non-trivial component misclassifications and there were no publications that
validated this method. That is why ``tedana`` replicated the established and valided
MEICA v2.5 method and also includes options to ingrate additional component selection
methods. Recently Prantik has started to work `MEICA v3.3`_ (for python >=v3.7) so
that this version of the selection process would again be possible to run.

.. _shared code on bitbucket: https://bitbucket.org/prantikk/me-ica/src/experimental
.. _distributed with AFNI as MEICA v2.5 beta 11: https://github.com/afni/afni/tree/master/src/pkundu
.. _MEICA v3.2: https://github.com/ME-ICA/me-ica/tree/53191a7e8838788acf837fdf7cb3026efadf49ac
.. _MEICA v3.3: https://github.com/ME-ICA/me-ica/tree/ME-ICA_v3.3.0


*******************************************************************
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ tedana is licensed under GNU Lesser General Public License version 2.1.

dependence_metrics
output_file_descriptions
component_table_descriptions
classification_output_descriptions


******************
Expand Down
12 changes: 6 additions & 6 deletions docs/outputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@ future processing. `descriptions of these output files are here`_.

.. _descriptions of these output files are here: output_file_descriptions.html

****************
Component tables
****************
*******************************************
Component tables and classification outputs
*******************************************

TEDPCA and TEDICA use component tables to track relevant metrics, component
classifications, and rationales behind classifications.
The component tables are stored as tsv files for BIDS-compatibility.
`Full descriptions of these outputs are here`_.
The component tables and additional information are stored as tsv and json files.
`A full description of these outputs are here`_.

.. _Full descriptions of these outputs are here: component_table_descriptions.html
.. _A full description of these outputs are here: classification_output_descriptions.html


*********************
Expand Down
8 changes: 4 additions & 4 deletions tedana/selection/selection_nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -769,9 +769,9 @@ def calc_kappa_elbow(
Note
----
This function is currently hard coded for a specific way to calculate the kappa elbow
based on the method by Kundu in the MEICA v2.7 code. This uses the minimum of
based on the method by Kundu in the MEICA v2.5 code. This uses the minimum of
a kappa elbow calculation on all components and on a subset of kappa values below
a significance threshold. To get the same functionality as in MEICA v2.7,
a significance threshold. To get the same functionality as in MEICA v2.5,
decide_comps must be 'all'.
"""

Expand Down Expand Up @@ -881,8 +881,8 @@ def calc_rho_elbow(
Note
----
This script is currently hard coded for a specific way to calculate the rho elbow
based on the method by Kundu in the MEICA v2.7 code. To get the same functionality
in MEICA v2.7, decide_comps must be 'all' and subset_decide_comps must be
based on the method by Kundu in the MEICA v2.5 code. To get the same functionality
in MEICA v2.5, decide_comps must be 'all' and subset_decide_comps must be
'unclassified' See :obj:`tedana.selection.selection_utils.rho_elbow_kundu_liberal`
for a more detailed explanation of the difference between the kundu and liberal
options.
Expand Down
4 changes: 2 additions & 2 deletions tedana/selection/selection_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,7 @@ def getelbow(arr, return_val=False):
def kappa_elbow_kundu(component_table, n_echos, comps2use=None):
"""
Calculate an elbow for kappa using the approach originally in
Prantik Kundu's MEICA v2.7 code
Prantik Kundu's MEICA v2.5 code
Parameters
----------
Expand Down Expand Up @@ -649,7 +649,7 @@ def rho_elbow_kundu_liberal(
):
"""
Calculate an elbow for rho using the approach originally in
Prantik Kundu's MEICA v2.7 code and with a slightly more
Prantik Kundu's MEICA v2.5 code and with a slightly more
liberal threshold
Parameters
Expand Down
2 changes: 2 additions & 0 deletions tedana/workflows/tedana.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@ def _get_parser():
"PCA decomposition with the mdl, kic and aic options "
"is based on a Moving Average (stationary Gaussian) "
"process and are ordered from most to least aggressive. "
"'kundu' or 'kundu-stabilize' are selection methods that "
"were distributed with MEICA. "
"Users may also provide a float from 0 to 1, "
"in which case components will be selected based on the "
"cumulative variance explained or an integer greater than 1"
Expand Down

0 comments on commit 8c54a18

Please sign in to comment.