The R package distrom contains functions for computing a distributed
multinomial regression. The main function is dmr()
which takes a
matrix of covars
and a matrix of multinomial counts
as input.
Independent Poisson log regressions of the form counts ~ covars
are
then fit for each multinomial count. These independent Poisson log
regressions are estimated in parallel using the parallel
and gamlr
packages which allows for easy in-memory parallelization and
distribution across multiple machines. This parallelization is essential
for use cases such as text analysis where the counts
matrix consists
of many tokenized documents and can grow to billions of observations. In
the text analysis use case token counts are modeled as arising from a
multinomial distribution that is dependent upon the article attributes
contained in the covars
matrix.
To cite this package, use “Taddy (2015), Distributed Multinomial Regression, Annals of Applied Statistics”.
For a description of the functions in the distrom package please read the reference manual: distrom manual
For a detailed explanation of distributed multinomial regression and example use cases see: Taddy (2015), Distributed Multinomial Regression, Annals of Applied Statistics
For information on the related gamlr package please read the gamlr manual or visit the gamlr repository.
For information on the related textir package please read the textir manual or visit the textir repository.
To install the stable version from CRAN:
install.packages("distrom")
To install the development version from GitHub:
# install.packages("remotes")
remotes::install_github("TaddyLab/distrom")