Skip to content

Commit

Permalink
Update based on some recent developments (#379)
Browse files Browse the repository at this point in the history
* bookkeeping

* removed unnecessary call

* marking as const

* adding more consts

* additional safety measures

* parallelization working (#355)

* parallelization working

* updating tests

* styling

* updating test

* added info about parallelization

* correcting an error in docs

* update cran-comments

* updating comments

* updating news

* deleting submission file

* Adding support for sampling from prior (#360)

* added prior sampling

* updating news

* deleting submission file

* fixing the prior sampling

* styling

* added unit test for SMC starting from prior

* added test for sample_prior

* updating test after merging with TBB

* styling

* updating news

* fixed error in update_mallows.SMCMallows

* updated failing test and added namespace qualifier

* added long-running SMC test from prior

* Take care of item names properly (#363)

* fixing item name issue

* updating a unit test

* styling

* Can now deal with a single vector of input data (#364)

* added handling of vector data #361

* styling

* updating news

* updated set_priors function

* closes #370

* Added a gamma prior (#371)

* updated set_priors function

* closes #370

* Had forgot to implement the change... (#372)

* updated set_priors function

* closes #370

* fixed forgotten implementation

* Added lag option (#373)

* updated documentation

* adding references section

* more documentation updates

* updated set_priors function

* closes #370

* implemented user-defined lag

* added lag, closes #369

* styling

* ready for the change

* Resampling issue 365 (#376)

* updated set_priors function

* closes #370

* ready for the change

* implemented the resampler in cpp

* added various resampling options. tests are missing

* fixed numerical overflow problem and forgotten mcmc loop

* added a simple test for the resampler

* styling

* added missing memory header

* removed two tests where my M1 mac gives different results

* removed git conflict marker

* added a line shift

* removing const-ref from built-in types

* refactoring limits functions for pairwise augmentation

* increasing test strictness

* added some more work

* moving distance code into implementation file

* moved partition function code into cpp files

* added code for reproducing Liu et al 2019 review

* changed updating frequency for pkgdown. closes #380

* adding ignore to codecov

* fixing #381 (#382)

* Heatplot issue 381 (#383)

* fixing #381

* added unit test for #381

* updated news
  • Loading branch information
osorensen authored Feb 16, 2024
1 parent a1d3ee8 commit 8850531
Show file tree
Hide file tree
Showing 85 changed files with 1,784 additions and 478 deletions.
4 changes: 0 additions & 4 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [master]
pull_request:
branches: [master]
release:
types: [published]
workflow_dispatch:
Expand Down
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: BayesMallows
Type: Package
Title: Bayesian Preference Learning with the Mallows Rank Model
Version: 2.0.1
Version: 2.0.1.9003
Authors@R: c(person("Oystein", "Sorensen",
email = "oystein.sorensen.1985@gmail.com",
role = c("aut", "cre"),
Expand Down Expand Up @@ -46,7 +46,7 @@ BugReports: https://github.com/ocbe-uio/BayesMallows/issues
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
Depends: R (>= 3.5.0)
Imports: Rcpp (>= 1.0.0),
ggplot2 (>= 3.1.0),
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,15 @@ S3method(compute_consensus,BayesMallows)
S3method(compute_consensus,SMCMallows)
S3method(compute_posterior_intervals,BayesMallows)
S3method(compute_posterior_intervals,SMCMallows)
S3method(generate_initial_ranking,BayesMallowsIntransitive)
S3method(generate_initial_ranking,BayesMallowsTransitiveClosure)
S3method(plot,BayesMallows)
S3method(plot,SMCMallows)
S3method(print,BayesMallows)
S3method(print,BayesMallowsMixtures)
S3method(print,SMCMallows)
S3method(update_mallows,BayesMallows)
S3method(update_mallows,BayesMallowsPriorSamples)
S3method(update_mallows,SMCMallows)
export(assess_convergence)
export(assign_cluster)
Expand All @@ -33,6 +36,7 @@ export(plot_elbow)
export(plot_top_k)
export(predict_top_k)
export(sample_mallows)
export(sample_prior)
export(set_compute_options)
export(set_initial_values)
export(set_model_options)
Expand Down
21 changes: 21 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# BayesMallows (development versions)

* Fixed a bug in heat_plot() when the model has been estimated with
rho_thinning > 1, causing the probabilities to be unnormalized. Issue #381.
Thanks to Marta Crispino for discovering the bug.
* Added stratified, systematic, and residual resampling to the sequential
Monte Carlo algorithm. These distributions should in general be preferred to
multinomial resampling, which was the only available option until now.
* The move step of the SMC algorithm now allows a user-defined lag for the
sampling of latent ranks, specified in the "latent_sampling_lag" argument
to set_smc_options().
* Prior for precision parameter alpha is now a gamma distribution. Until now
an exponential distribution has been assumed. Since the exponential is a special
case of the gamma with shape parameter equal to 1 (the default), this is not
a breaking change. However, it adds flexibility when it comes to specifying the prior.
* setup_rank_data() now accepts a single vector of rankings, silently converting a to matrix with a single row.
* Sequential Monte Carlo algorithm can now start from a sample from the prior
distribution, see the sample_prior() function for an example.
* Added support for parallelism under-the-hood with oneTBB.

# BayesMallows 2.0.1

* Edits to C++ code fixing memory leaks.
* Edits to unit tests which caused issues on CRAN.

# BayesMallows 2.0.0
Expand Down
15 changes: 2 additions & 13 deletions R/compute_mallows.R
Original file line number Diff line number Diff line change
Expand Up @@ -79,21 +79,10 @@ compute_mallows <- function(
validate_class(initial_values, "BayesMallowsInitialValues")
validate_rankings(data)
validate_preferences(data, model_options)
validate_rankings(data)
validate_initial_values(initial_values, data)

pfun_values <- tryCatch(
prepare_partition_function(model_options$metric, data$n_items),
error = function(e) {
if (is.null(pfun_estimate)) {
stop(
"Exact partition function not known. Please provide an ",
"estimate in argument pfun_estimate."
)
} else {
return(NULL)
}
}
)
pfun_values <- extract_pfun_values(model_options, data, pfun_estimate)

if (is.null(cl)) {
lapplyfun <- lapply
Expand Down
16 changes: 16 additions & 0 deletions R/estimate_partition_function.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,22 @@ estimate_partition_function <- function(
matrix(c(power, stats::lm(form, data = estimate)$coefficients), ncol = 2)
}

extract_pfun_values <- function(model_options, data, pfun_estimate) {
tryCatch(
prepare_partition_function(model_options$metric, data$n_items),
error = function(e) {
if (is.null(pfun_estimate)) {
stop(
"Exact partition function not known. Please provide an ",
"estimate in argument pfun_estimate."
)
} else {
return(NULL)
}
}
)
}

prepare_partition_function <- function(metric, n_items) {
if (metric %in% c("cayley", "hamming", "kendall")) {
return(NULL)
Expand Down
8 changes: 2 additions & 6 deletions R/generate_initial_ranking.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ generate_initial_ranking <- function(
UseMethod("generate_initial_ranking")
}

#' @export
generate_initial_ranking.BayesMallowsTransitiveClosure <- function(
preferences, cl = NULL, shuffle_unranked = FALSE, random = FALSE,
random_limit = 8L) {
Expand Down Expand Up @@ -34,12 +35,7 @@ generate_initial_ranking.BayesMallowsTransitiveClosure <- function(
}
}


.S3method(
"generate_initial_ranking", "BayesMallowsTransitiveClosure",
generate_initial_ranking.BayesMallowsTransitiveClosure
)

#' @export
generate_initial_ranking.BayesMallowsIntransitive <- function(
preferences, cl = NULL, shuffle_unranked = FALSE,
random = FALSE, random_limit = 8L) {
Expand Down
4 changes: 2 additions & 2 deletions R/heat_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@ heat_plot <- function(model_fit, burnin = model_fit$burnin, ...) {
drop = FALSE
]
posterior_ranks$probability <- 1
posterior_ranks$iteration <- NULL

heatplot_data <- aggregate(posterior_ranks[, "probability", drop = FALSE],
by = list(
cluster = posterior_ranks$cluster,
item = posterior_ranks$item,
value = posterior_ranks$value
),
FUN = function(x) sum(x) / (model_fit$nmc - burnin)
FUN = function(x) sum(x) / length(unique(posterior_ranks$iteration))
)

heatplot_data$item <- factor(heatplot_data$item, levels = item_order)
Expand Down
30 changes: 30 additions & 0 deletions R/sample_prior.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#' Sample from prior distribution
#'
#' Function to obtain samples from the prior distributions of the Bayesian
#' Mallows model. Intended to be given to [update_mallows()].
#'
#' @param n An integer specifying the number of samples to take.
#' @param n_items An integer specifying the number of items to be ranked.
#' @param priors An object of class "BayesMallowsPriors" returned from
#' [set_priors()].
#'
#' @return An object of class "BayesMallowsPriorSample", containing `n`
#' independent samples of \eqn{\alpha} and \eqn{\rho}.
#'
#' @export
#'
#' @family modeling
#' @example /inst/examples/sample_prior_example.R
sample_prior <- function(n, n_items, priors = set_priors()) {
validate_positive(n)
validate_positive(n_items)
ret <- list(
alpha = stats::rgamma(n, shape = priors$gamma, rate = priors$lambda),
rho = replicate(n, sample(n_items, n_items)),
priors = priors,
n_items = n_items,
items = seq_len(n_items)
)
class(ret) <- "BayesMallowsPriorSamples"
ret
}
12 changes: 10 additions & 2 deletions R/set_priors.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,13 @@
#' @description Set values related to the prior distributions for the Bayesian
#' Mallows model.
#'
#' @param gamma Strictly positive numeric value specifying the shape parameter
#' of the gamma prior distribution of \eqn{\alpha}. Defaults to `1`, thus
#' recovering the exponential prior distribution used by
#' \insertCite{vitelli2018}{BayesMallows}.
#'
#' @param lambda Strictly positive numeric value specifying the rate parameter
#' of the truncated exponential prior distribution of \eqn{\alpha}. Defaults
#' of the gamma prior distribution of \eqn{\alpha}. Defaults
#' to `0.001`. When `n_cluster > 1`, each mixture component \eqn{\alpha_{c}}
#' has the same prior distribution.
#'
Expand All @@ -26,10 +31,13 @@
#' [update_mallows()].
#' @export
#'
#' @references \insertAllCited{}
#'
#' @family preprocessing
#'
set_priors <- function(lambda = 0.001, psi = 10, kappa = c(1, 3)) {
set_priors <- function(gamma = 1, lambda = 0.001, psi = 10, kappa = c(1, 3)) {
stopifnot(length(kappa) == 2)
validate_positive(gamma)
validate_positive(lambda)
validate_integer(psi)
validate_positive(psi)
Expand Down
57 changes: 55 additions & 2 deletions R/set_smc_options.R
Original file line number Diff line number Diff line change
@@ -1,20 +1,73 @@
#' @title Set SMC compute options
#'
#' @description Sets the SMC compute options to be used in [update_mallows.BayesMallows()].
#' @description Sets the SMC compute options to be used in
#' [update_mallows.BayesMallows()].
#'
#' @param n_particles Integer specifying the number of particles.
#' @param mcmc_steps Number of MCMC steps to be applied in the resample-move
#' step.
#' @param resampler Character string defining the resampling method to use. One
#' of "stratified", "systematic", "residual", and "multinomial". Defaults to
#' "stratified". While multinomial resampling was used in
#' \insertCite{steinSequentialInferenceMallows2023;textual}{BayesMallows},
#' stratified, systematic, or residual resampling typically give lower Monte
#' Carlo error \insertCite{Douc2005,Hol2006,Naesseth2019}{BayesMallows}.
#' @param latent_sampling_lag Parameter specifying the number of timesteps to go
#' back when resampling the latent ranks in the move step. See Section 6.2.3
#' of \insertCite{Kantas2015}{BayesMallows} for details. The \eqn{L} in their
#' notation corresponds to `latent_sampling_lag`. See more under Details.
#' Defaults to `NA`, which means that all latent ranks from previous timesteps
#' are resampled. If set to `0`, no move step is applied to the latent ranks.
#'
#' @return An object of class "SMCOptions".
#'
#'
#' @section Lag parameter in move step:
#'
#' The parameter `latent_sampling_lag` corresponds to \eqn{L} in
#' \insertCite{Kantas2015}{BayesMallows}. Its use in this package is can be
#' explained in terms of Algorithm 12 in
#' \insertCite{steinSequentialInferenceMallows2023}{BayesMallows}. The
#' relevant line of the algorithm is:
#'
#' **for** \eqn{j = 1 : M_{t}} **do** \cr
#' **M-H step:** update \eqn{\tilde{\mathbf{R}}_{j}^{(i)}} with proposal
#' \eqn{\tilde{\mathbf{R}}_{j}' \sim q(\tilde{\mathbf{R}}_{j}^{(i)} |
#' \mathbf{R}_{j}, \boldsymbol{\rho}_{t}^{(i)}, \alpha_{t}^{(i)})}.\cr
#' **end**
#'
#' Let \eqn{L} denote the value of `latent_sampling_lag`. With this parameter,
#' we modify for algorithm so it becomes
#'
#' **for** \eqn{j = M_{t-L+1} : M_{t}} **do** \cr
#' **M-H step:** update \eqn{\tilde{\mathbf{R}}_{j}^{(i)}} with proposal
#' \eqn{\tilde{\mathbf{R}}_{j}' \sim q(\tilde{\mathbf{R}}_{j}^{(i)} |
#' \mathbf{R}_{j}, \boldsymbol{\rho}_{t}^{(i)}, \alpha_{t}^{(i)})}.\cr
#' **end**
#'
#' This means that with \eqn{L=0} no move step is performed on any latent
#' ranks, whereas \eqn{L=1} means that the move step is only applied to the
#' parameters entering at the given timestep. The default,
#' `latent_sampling_lag = NA` means that \eqn{L=t} at each timestep, and hence
#' all latent ranks are part of the move step at each timestep.
#'
#'
#' @export
#' @references \insertAllCited{}
#'
#' @family preprocessing
set_smc_options <- function(
n_particles = 1000,
mcmc_steps = 5) {
mcmc_steps = 5,
resampler = c("stratified", "systematic", "residual", "multinomial"),
latent_sampling_lag = NA_integer_) {
validate_integer(n_particles)
validate_integer(mcmc_steps)
if (!is.na(latent_sampling_lag)) validate_integer(latent_sampling_lag)
resampler <- match.arg(
resampler,
c("stratified", "systematic", "residual", "multinomial")
)

ret <- as.list(environment())
class(ret) <- "SMCOptions"
Expand Down
42 changes: 38 additions & 4 deletions R/setup_rank_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
#' optional initial value of the rankings. If `rankings` has column names,
#' these are assumed to be the names of the items. `NA` values in rankings are
#' treated as missing data and automatically augmented; to change this
#' behavior, see the `na_action` argument to [set_model_options()].
#' behavior, see the `na_action` argument to [set_model_options()]. A vector
#' length `n_items` is silently converted to a matrix of length `1 x n_items`,
#' and names (if any), are used as column names.
#'
#' @param preferences A data frame with one row per pairwise comparison, and
#' columns `assessor`, `top_item`, and `bottom_item`. Each column contains the
Expand Down Expand Up @@ -80,6 +82,12 @@
#' when all possible orderings are computed, i.e., when `random=TRUE`.
#' Defaults to `8L`.
#'
#' @param timepoint Integer vector specifying the timepoint. Defaults to `NULL`,
#' which means that a vector of ones, one for each observation, is generated.
#' Used by [update_mallows()] to identify data with a given iteration of the
#' sequential Monte Carlo algorithm. If not `NULL`, must contain one integer
#' for each row in `rankings`.
#'
#' @note Setting `random=TRUE` means that all possible orderings of each
#' assessor's preferences are generated, and one of them is picked at random.
#' This can be useful when experiencing convergence issues, e.g., if the MCMC
Expand Down Expand Up @@ -126,7 +134,8 @@ setup_rank_data <- function(
cl = NULL,
shuffle_unranked = FALSE,
random = FALSE,
random_limit = 8L) {
random_limit = 8L,
timepoint = NULL) {
na_action <- match.arg(na_action, c("augment", "fail", "omit"))

if (is.null(rankings) && is.null(preferences)) {
Expand All @@ -138,10 +147,19 @@ setup_rank_data <- function(
if (na_action == "fail" && any(is.na(rankings))) {
stop("rankings matrix contains NA values")
}
if (!is.matrix(rankings)) {
rankings <- matrix(rankings,
nrow = 1,
dimnames = list(NULL, names(rankings))
)
}

if (na_action == "omit" && any(is.na(rankings))) {
keeps <- apply(rankings, 1, function(x) !any(is.na(x)))
print(paste("Omitting", sum(!keeps), "row(s) from rankings due to NA values"))
print(paste(
"Omitting", sum(!keeps),
"row(s) from rankings due to NA values"
))
rankings <- rankings[keeps, , drop = FALSE]
}
} else {
Expand All @@ -153,7 +171,10 @@ setup_rank_data <- function(
if (!is.null(observation_frequency)) {
validate_positive_vector(observation_frequency)
if (nrow(rankings) != length(observation_frequency)) {
stop("observation_frequency must be of same length as the number of rows in rankings")
stop(
"observation_frequency must be of same ",
"length as the number of rows in rankings"
)
}
} else {
observation_frequency <- rep(1, nrow(rankings))
Expand All @@ -164,6 +185,19 @@ setup_rank_data <- function(
stop("invalid permutations provided in rankings matrix")
}
n_items <- ncol(rankings)

if (!is.null(colnames(rankings))) {
items <- colnames(rankings)
} else {
items <- paste("Item", seq(from = 1, to = n_items, by = 1))
}

if (is.null(timepoint)) timepoint <- rep(1, nrow(rankings))
if (length(timepoint) != nrow(rankings)) {
stop("must have one timepoint per row")
}


constraints <- generate_constraints(preferences, n_items, cl)
consistent <- matrix(integer(0))

Expand Down
Loading

0 comments on commit 8850531

Please sign in to comment.