Skip to content

sarahromanes/multiDA

Repository files navigation

multiDA Build Status

High Dimensional Discriminant Analysis using Multiple Hypothesis Testing

Overview

multiDA is a Discriminant Analysis (DA) algorithm capable for use in high dimensional datasets, providing feature selection through multiple hypothesis testing. This algorithm has minimal tuning parameters, is easy to use, and offers improvement in speed compared to existing DA classifiers.

Publication to appear in JCGS. See our preprint - available on arXiv, here.

This package is part of a suite of discriminant analysis packages we have authored for large-scale/complex datasets. See also our package genDA, a statistical ML method for Multi-distributional Discriminant Analysis using Generalised Linear Latent Variable Modelling.

Installation

# Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("sarahromanes/multiDA")

Usage

The following example trains the multiDA classifier using the SRBCT dataset, and finds the resubstitution error rate.

y   <- SRBCT$y
X   <- SRBCT$X
res  <- multiDA(X, y, penalty="EBIC", equal.var=TRUE, set.options="exhaustive")
vals <- predict(res, newdata=X)$y.pred          #y.pred returns class labels
rser <- sum(vals!=y)/length(y)

A case study and overview of the statistical processes behind multiDA can be found here.

Authors

License

This project is licensed under the GPL-2 license.

Acknowledgements

I am grateful to everyone who has provided thoughtful and helpful comments to support me building my first package - especially members of the Sydney University Statistical Bioinformatics group and also the NUMBATS group at Monash University. You guys rock!