Software implementation for tensor-tensor m-product framework [1]. The library currently contains tubal QR and tSVDM decompositions, and the TCAM method for dimensionality reduction.
The mprod-package
is hosted in conda-forge channel.
conda install -c conda-forge mprod-package
pip install mprod-package
See mprod-package
s pypi entry
- Make sure that all dependencies listed in
requirements.txt
file are installed . - Clone the repository, then from the package directory, run
pip install -e .
The dependencies in requirements.txt
are stated with exact versions used for locally test mprod-package
, these packages were obtained from conda-forge channel.
import pandas as pd
file_path = "https://raw.githubusercontent.com/UriaMorP/" \
"tcam_analysis_notebooks/main/Schirmer2018/Schirmer2018.tsv"
data_table = pd.read_csv(file_path, index_col=[0,1], sep="\t"
, dtype={'Week':int})
data_table = data_table.loc[:,data_table.median() > 1e-7]
data_table.rename(columns= {k:f"Fature_{e+1}" for e,k in enumerate(data_table.columns)}, inplace=True)
data_table.shape
%matplotlib inline
Given with a pandas.DataFrame
of the data as below, with 2-level index, where the first level as subject identifier (mouse, human, image) and the second level of the index denotes sample repetition identity, in this case - the week during experiment, in which the sample was collected.
display(data_table.iloc[:2,:2].round(3))
Fature_1 | Fature_2 | ||
---|---|---|---|
SubjectID | Week | ||
P_10343 | 0 | 0.001 | 0.023 |
4 | 0.020 | 0.000 |
We use the table2tensor
helper function to transform a 2-level (multi)-indexed pandas.DataFrame
into a 3rd order tensor.
from mprod import table2tensor
data_tensor, map1, map3 = table2tensor(data_table)
To inspect table2tensor
operation, we use the resulting "mode mappings"; map1
and map3
associating each line in the input table to it's coordinates in the resulting tensor.
In the following example, we use the mappings to extract the tensor coordinates corresponding to subject P_7218's sample from week 52
(data_tensor[map1['P_7218'],:, map3[52]] == data_table.loc[('P_7218',52)].values).all() # True
from mprod.dimensionality_reduction import TCAM
tca = TCAM()
tca_trans = tca.fit_transform(data_tensor)
And that's all there is to it... Really!
Note how similar the code above to what we would have written if we were to apply scikit-lean's PCA
to the initial tabular data:
from sklearn.decomposition import PCA
pca = PCA()
pca_trans = pca.fit_transform(data_table)
The similarity between TCAM
s interface to that of scikit-learn's PCA
is not coincidental.
We did our best in order to make TCAM
as familiar as possible, and allow for high compatibility of TCAM
with the existing Python ML framework.
tca_loadings = tca.mode2_loadings # Obtain TCAM loadings
pca_loadings = pca.components_ # Obtain PCA loadings
tca_var = tca.explained_variance_ratio_*100 # % explained variation per TCA factor
pca_var = pca.explained_variance_ratio_*100 # % explained variation per TCA factor
tca_df = pd.DataFrame(tca_trans) # Cast TCA scores to dataframe
tca_df.rename(index = dict(map(reversed, map1.items()))
, inplace = True) # use the inverse of map1 to denote each row
# of the TCAM scores with it's subject ID
pca_df = pd.DataFrame(pca_trans) # Cast PCA scores to dataframe
pca_df.index = data_table.index # anotate PC scores with sample names
[1] Misha E. Kilmer, Lior Horesh, Haim Avron, and Elizabeth Newman. Tensor-tensor algebra for optimal representation and compression of multiway data. Proceedings of the National Academy of Sciences, 118(28):e2015851118, jul 2021.