Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding robustica option to ICA decomposition to achieve consistent results #1013

Merged
merged 47 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
f4eaa3e
Add robustica method
BahmanTahayori Jul 31, 2023
b0cac3a
Incorporation of major comments regarding robustica addition
BahmanTahayori Dec 5, 2023
55c2ae4
Add robustica 0.1.3 to dependency list
BahmanTahayori Nov 1, 2023
cd55a3f
Multiple fixes to RobustICA addition from code review
BahmanTahayori Dec 5, 2023
2d9b007
Specify magic number fixed seed of 42 as a constant
BahmanTahayori Nov 29, 2023
09e565e
Merge remote-tracking branch 'upstream/main' into add_robustica_rsclean
BahmanTahayori Dec 5, 2023
fc5f9ea
Updated
BahmanTahayori Dec 5, 2023
4fc3043
Robustica Updates
BahmanTahayori Dec 6, 2023
a20ff57
Incorporating the third round of Robert E. Smith's comments
BahmanTahayori Dec 20, 2023
cc5e05d
Merge pull request #3 from BahmanTahayori/add_robustica_rsclean
BahmanTahayori Dec 20, 2023
a449fec
Merge branch 'ME-ICA:main' into main
BahmanTahayori Feb 9, 2024
78c8140
Enhance the "ica_method" description suggested by D. Handwerker
BahmanTahayori Feb 9, 2024
ac85e6a
Enhancing the "n_robust_runs" description suggested by D. Handwerkerd
BahmanTahayori Feb 9, 2024
979d026
RobustICA: Restructure code loop over robust methods (#4)
Lestropie Feb 11, 2024
71d8d4a
merging recent changes
BahmanTahayori Feb 21, 2024
cac38cd
Applied suggested changes
BahmanTahayori Feb 29, 2024
5fcf148
Fixing the conflict
BahmanTahayori Feb 29, 2024
b7d08e9
Merge branch 'ME-ICA:main' into main
BahmanTahayori Feb 29, 2024
a113423
Incorporating more comments
BahmanTahayori Mar 4, 2024
b60e9a6
Merge remote-tracking branch 'upstream/main'
Lestropie Apr 12, 2024
45c95ce
aligning adding-robustica with Main
handwerkerd Aug 13, 2024
8e6878f
Adding already requested changes
handwerkerd Aug 13, 2024
88fd148
fixed failing tests
handwerkerd Aug 16, 2024
a221e72
updated documentation in faq.rst
handwerkerd Aug 27, 2024
8622a9b
more documentation changes
handwerkerd Aug 29, 2024
419e9d4
Update docs/faq.rst
handwerkerd Aug 30, 2024
d29a91b
Update docs/faq.rst
handwerkerd Aug 30, 2024
efb712e
Aligning robustICA with current Main + (#5)
handwerkerd Aug 30, 2024
dee68ec
align with main
handwerkerd Sep 3, 2024
171a835
Merge branch 'ME-ICA:main' into adding-robustica
handwerkerd Sep 3, 2024
4b86fe2
fixed ica.py docstring error
handwerkerd Sep 3, 2024
c01ec51
added scikit-learn-extra to pyproject and changed ref name
handwerkerd Sep 3, 2024
cd50037
increment circleci version keys
handwerkerd Sep 3, 2024
54a4ac3
Merge branch 'main' into adding-robustica
BahmanTahayori Sep 10, 2024
9bb021a
Merge pull request #6 from handwerkerd/adding-robustica
BahmanTahayori Sep 10, 2024
cd29060
Merge branch 'ME-ICA:main' into main
BahmanTahayori Sep 10, 2024
2a868e3
Removing the scikit-learn-extra dependency
BahmanTahayori Sep 11, 2024
ed10ade
Updating pyproject.toml file
BahmanTahayori Sep 11, 2024
8a14277
Minor changes to make the help more readable
BahmanTahayori Sep 12, 2024
b793d83
Minor changes
BahmanTahayori Sep 12, 2024
1b1eb38
upgrading to robustica 0.1.4
BahmanTahayori Sep 17, 2024
87965f4
Update docs
BahmanTahayori Sep 17, 2024
8743aca
updating utils.py, toml file and the docs
BahmanTahayori Sep 18, 2024
42372e3
minor change to utils.py
BahmanTahayori Sep 18, 2024
3d88eb4
Merge branch 'main' into main
handwerkerd Sep 18, 2024
1aac94a
Incorporating Eneko's comments
BahmanTahayori Sep 20, 2024
2b0ef67
Added a warning when the clustering method is changed
BahmanTahayori Sep 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ dependencies = [
"nibabel>=2.5.1",
"nilearn>=0.7",
"numpy>=1.16",
"robustica>=0.1.3",
handwerkerd marked this conversation as resolved.
Show resolved Hide resolved
"pandas>=2.0",
"scikit-learn>=0.21",
"scipy>=1.2.0",
Expand Down
6 changes: 6 additions & 0 deletions tedana/config.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to open an issue so we can discuss where we move these defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #1132.

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Setting default values for ICA decomposition."""
handwerkerd marked this conversation as resolved.
Show resolved Hide resolved
DEFAULT_ICA_METHOD = "robustica"
DEFAULT_N_ROBUST_RUNS = 30
DEFAULT_N_MAX_ITER = 500
DEFAULT_N_MAX_RESTART = 10
DEFAULT_SEED = 42
187 changes: 180 additions & 7 deletions tedana/decomposition/ica.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,45 @@
import warnings

import numpy as np
from robustica import RobustICA
from scipy import stats
from sklearn.decomposition import FastICA

from tedana.config import (
DEFAULT_ICA_METHOD,
DEFAULT_N_MAX_ITER,
DEFAULT_N_MAX_RESTART,
DEFAULT_N_ROBUST_RUNS,
)

LGR = logging.getLogger("GENERAL")
RepLGR = logging.getLogger("REPORT")


def tedica(data, n_components, fixed_seed, maxit=500, maxrestart=10):
"""Perform ICA on ``data`` and return mixing matrix.
def tedica(
data,
n_components,
fixed_seed,
ica_method=DEFAULT_ICA_METHOD,
n_robust_runs=DEFAULT_N_ROBUST_RUNS,
maxit=DEFAULT_N_MAX_ITER,
maxrestart=DEFAULT_N_MAX_RESTART,
):
"""Perform ICA on `data` with the user selected ica method and returns mixing matrix.

Parameters
----------
data : (S x T) :obj:`numpy.ndarray`
Dimensionally reduced optimally combined functional data, where `S` is
samples and `T` is time
n_components : :obj:`int`
Number of components retained from PCA decomposition
Number of components retained from PCA decomposition.
fixed_seed : :obj:`int`
Seed for ensuring reproducibility of ICA results
Seed for ensuring reproducibility of ICA results.
ica_method : :obj: `str'
slected ICA method, can be fastica or robutica.
BahmanTahayori marked this conversation as resolved.
Show resolved Hide resolved
n_robust_runs : :obj: `int'
selected number of robust runs when robustica is used. Default is 30.
maxit : :obj:`int`, optional
Maximum number of iterations for ICA. Default is 500.
maxrestart : :obj:`int`, optional
Expand All @@ -37,16 +57,169 @@ def tedica(data, n_components, fixed_seed, maxit=500, maxrestart=10):
fixed_seed : :obj:`int`
Random seed from final decomposition.

Notes
-----
Uses `sklearn` implementation of FastICA for decomposition
"""
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")
RepLGR.info(
"Independent component analysis was then used to "
"decompose the dimensionally reduced dataset."
)

ica_method = ica_method.lower()

if ica_method == "robustica":
mmix, fixed_seed = r_ica(
data,
n_components=n_components,
fixed_seed=fixed_seed,
n_robust_runs=n_robust_runs,
max_it=maxit,
)
elif ica_method == "fastica":
mmix, fixed_seed = f_ica(
data,
n_components=n_components,
fixed_seed=fixed_seed,
maxit=maxit,
maxrestart=maxrestart,
)
else:
raise ValueError("The selected ICA method is invalid!")

return mmix, fixed_seed


def r_ica(data, n_components, fixed_seed, n_robust_runs, max_it):
"""Perform robustica on `data` and returns mixing matrix.

Parameters
----------
data : (S x T) :obj:`numpy.ndarray`
Dimensionally reduced optimally combined functional data, where `S` is
samples and `T` is time
n_components : :obj:`int`
Number of components retained from PCA decomposition.
fixed_seed : :obj:`int`
Seed for ensuring reproducibility of ICA results.
n_robust_runs : :obj: `int'
selected number of robust runs when robustica is used. Default is 30.
maxit : :obj:`int`, optional
Maximum number of iterations for ICA. Default is 500.

Returns
-------
mmix : (T x C) :obj:`numpy.ndarray`
Z-scored mixing matrix for converting input data to component space,
where `C` is components and `T` is the same as in `data`
fixed_seed : :obj:`int`
Random seed from final decomposition.
"""
if n_robust_runs > 200:
LGR.warning(
"The selected n_robust_runs is a very big number! The process will take a long time!"
)

RepLGR.info("RobustICA package was used for ICA decomposition \\citep{Anglada2022}.")

if fixed_seed == -1:
fixed_seed = np.random.randint(low=1, high=1000)

try:
rica = RobustICA(
n_components=n_components,
robust_runs=n_robust_runs,
whiten="arbitrary-variance",
max_iter=max_it,
random_state=fixed_seed,
robust_dimreduce=False,
fun="logcosh",
robust_method="DBSCAN",
)

s, mmix = rica.fit_transform(data)
handwerkerd marked this conversation as resolved.
Show resolved Hide resolved
q = rica.evaluate_clustering(
rica.S_all, rica.clustering.labels_, rica.signs_, rica.orientation_
)

except:
handwerkerd marked this conversation as resolved.
Show resolved Hide resolved
rica = RobustICA(
n_components=n_components,
robust_runs=n_robust_runs,
whiten="arbitrary-variance",
max_iter=max_it,
random_state=fixed_seed,
robust_dimreduce=False,
fun="logcosh",
robust_method="AgglomerativeClustering",
)

s, mmix = rica.fit_transform(data)
q = rica.evaluate_clustering(
rica.S_all, rica.clustering.labels_, rica.signs_, rica.orientation_
)

iq = np.array(
np.mean(q[q["cluster_id"] >= 0].iq)
) # Excluding outliers (cluster -1) from the index quality calculation

if iq < 0.6:
LGR.warning(
f"The resultant mean Index Quality is low ({iq}). It is recommended to rerun the "
"process with a different seed."
)

mmix = mmix[
:, q["cluster_id"] >= 0
] # Excluding outliers (cluster -1) when calculating the mixing matrix
mmix = stats.zscore(mmix, axis=0)

LGR.info(
f"RobustICA with {n_robust_runs} robust runs and seed {fixed_seed} was used. "
f"The mean Index Quality is {iq}."
)

no_outliers = np.count_nonzero(rica.clustering.labels_ == -1)
if no_outliers:
LGR.info(
"The DBSCAN clustering algorithm detected outliers when clustering "
handwerkerd marked this conversation as resolved.
Show resolved Hide resolved
"components for different runs. These outliers are excluded when calculating "
"the index quality and the mixing matrix to maximise the robustness of the "
"decomposition."
)

return mmix, fixed_seed


def f_ica(data, n_components, fixed_seed, maxit, maxrestart):
"""Perform FastICA on `data` and returns mixing matrix.

Parameters
----------
data : (S x T) :obj:`numpy.ndarray`
Dimensionally reduced optimally combined functional data, where `S` is
samples and `T` is time
n_components : :obj:`int`
Number of components retained from PCA decomposition
fixed_seed : :obj:`int`
Seed for ensuring reproducibility of ICA results
maxit : :obj:`int`, optional
Maximum number of iterations for ICA. Default is 500.
maxrestart : :obj:`int`, optional
Maximum number of attempted decompositions to perform with different
random seeds. ICA will stop running if there is convergence prior to
reaching this limit. Default is 10.

Returns
-------
mmix : (T x C) :obj:`numpy.ndarray`
Z-scored mixing matrix for converting input data to component space,
where `C` is components and `T` is the same as in `data`
fixed_seed : :obj:`int`
Random seed from final decomposition.

Notes
-----
Uses `sklearn` implementation of FastICA for decomposition
"""
if fixed_seed == -1:
fixed_seed = np.random.randint(low=1, high=1000)

Expand Down
10 changes: 10 additions & 0 deletions tedana/resources/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -313,3 +313,13 @@ @Article{Hunter:2007
doi = {10.1109/MCSE.2007.55},
year = 2007
}

@Article{Anglada:2022,
Copy link
Collaborator

@dowdlelt dowdlelt Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be anglada2022?

Got the following near the end of a run

File "/Users/logan/local_code/tedana/tedana/reporting/html_report.py", line 41, in _cite2html
    first_author = bibliography.entries[key].persons["author"][0]
  File "/Users/logan/miniconda3/envs/py310/lib/python3.10/site-packages/pybtex/utils.py", line 163, in __getitem__
    return self._dict[key.lower()]
KeyError: 'anglada2022'

And I notice that all other files in the references.bib are in that lower format + no ':'. Maybe the case is handled, but seems like the colon not?

Or line 122 in ica.py should have the colon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dowdlelt. Yes, that is anglada2022. I am currently working on finalising this PR and all these issues will be resolved soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome news, looking forward to it!

Author = {Anglada-Girotto Miquel and Miravet-Verde Samuel and Serrano Luis and Head Sarah},
Title = {robustica: customizable robust independent component analysis},
Journal = {BMC Bioinformatics},
Volume = {23},
Number = {519},
doi = {10.1186/s12859-022-05043-9},
year = 2022
}
Loading