tmsamples

The goal of tmsamples is to simulate term frequency matrices (DTMs or TCMs) based on parameters from real or simulated probabilistic topic models.

This corpus simulation is a core part of my dissertation research. The basic idea is: if you can simulate a corpus using the functional form of a topic model, and that corpus retains the gross statistical properties of human language, then we can use simulated corpora to derive rules and metrics to fit well specified topic models. Conversely, we can avoid pathologically misspecified models.

Installation

You can install the GitHub version of tmsamples with:

library(remotes)
remotes::install_github("tommyjones/tmsamples")

Example

This is a basic example which shows you how to solve a common problem:

library(tmsamples)
#> Loading required package: Matrix


Nk <- 4
Nd <- 50
Nv <- 1000

alpha <- rgamma(Nk, 0.5)

beta <- generate_zipf(vocab_size = Nv, magnitude = 500, zipf_par = 1.1)

pars <- sample_parameters(alpha, beta, Nd)

doc_lengths <- rpois(Nd, 50)

dtm <- sample_documents(
  theta = pars$theta,
  phi = pars$phi,
  doc_lengths = doc_lengths,
  threads = 2 ## threads controls parallel computation
)
#> ==================================================

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
R		R
inst/include		inst/include
man		man
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
tmsamples.Rproj		tmsamples.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

tmsamples

Installation

Example

About

Licenses found

Releases

Packages

Languages

License

Licenses found

TommyJones/tmsamples

Folders and files

Latest commit

History

Repository files navigation

tmsamples

Installation

Example

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages