diff --git a/R/correctR.R b/R/correctR.R index 6ce15a7..e3e30de 100644 --- a/R/correctR.R +++ b/R/correctR.R @@ -1,5 +1,6 @@ #' #' @docType package +#' @aliases correctR-package #' @name correctR #' @title Corrections For Correlated Test Statistics #' diff --git a/README.md b/README.md index 8d3a91e..81f0b29 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ install.packages("correctR") You can install the development version of `correctR` from GitHub: ``` r -devtools::install_github("hendersontrent/correctR") +devtools::install_github("hendersontrent/theft") ``` ## General purpose diff --git a/docs/404.html b/docs/404.html index 5696584..0a33123 100644 --- a/docs/404.html +++ b/docs/404.html @@ -7,10 +7,10 @@ Page not found (404) • correctR - - + + - + License • correctRLicense • correctR @@ -56,8 +60,12 @@ diff --git a/docs/LICENSE.html b/docs/LICENSE.html index c819055..12eeb92 100644 --- a/docs/LICENSE.html +++ b/docs/LICENSE.html @@ -1,5 +1,9 @@ +<<<<<<< HEAD +MIT License • correctRMIT License • correctR @@ -73,8 +77,12 @@ diff --git a/docs/articles/correctR.html b/docs/articles/correctR.html index caa9348..a72ab1e 100644 --- a/docs/articles/correctR.html +++ b/docs/articles/correctR.html @@ -8,10 +8,10 @@ Introduction to correctR • correctR - - + + - + Articles • correctRArticles • correctR @@ -58,8 +62,12 @@

All vignettes

diff --git a/docs/authors.html b/docs/authors.html index 5f0bb08..9d5622c 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -1,5 +1,9 @@ +<<<<<<< HEAD +Authors and Citation • correctRAuthors and Citation • correctR @@ -77,8 +81,12 @@

Citation

diff --git a/docs/deps/data-deps.txt b/docs/deps/data-deps.txt index aef7eff..cd52e33 100644 --- a/docs/deps/data-deps.txt +++ b/docs/deps/data-deps.txt @@ -1,4 +1,4 @@ - - + + diff --git a/docs/index.html b/docs/index.html index 8fafb56..ffe332f 100644 --- a/docs/index.html +++ b/docs/index.html @@ -11,10 +11,10 @@ and presented in Bouckaert and Frank (2004) <doi:10.1007/978-3-540-24775-3_3>."> Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples • correctR - - + + - +
-

General purpose

-

Often in machine learning, we want to compare the performance of -different models to determine if one statistically outperforms another. -However, the methods used (e.g., data resampling, k-fold cross-validation) to obtain -these performance metrics (e.g., classification accuracy) violate the -assumptions of traditional statistical tests such as a t-test. The purpose of these methods -is to either aid generalisability of findings (i.e., through -quantification of error as they produce multiple values for each model -instead of just one) or to optimise model hyperparameters. This makes -them invaluable, but unusable with traditional tests, as Dietterich (1998) -found that the standard t-test -underestimates the variance, therefore driving a high Type I error. -correctR is a lightweight package that implements a small -number of corrected test statistics for cases when samples are not -independent (and therefore are correlated), such as in the case of -resampling, k-fold -cross-validation, and repeated k-fold cross-validation. These -corrections were all originally proposed by Nadeau -and Bengio (2003). Currently, only cases where two models are to be -compared are supported.

+

Often in machine learning, we want to compare the performance of different models to determine if one statistically outperforms another. However, the methods used (e.g., data resampling, k-fold cross-validation) to obtain these performance metrics (e.g., classification accuracy) violate the assumptions of traditional statistical tests such as a t-test. The purpose of these methods is to either aid generalisability of findings (i.e., through quantification of error as they produce multiple values for each model instead of just one) or to optimise model hyperparameters. This makes them invaluable, but unusable with traditional tests, as Dietterich (1998) found that the standard t-test underestimates the variance, therefore driving a high Type I error. correctR is a lightweight package that implements a small number of corrected test statistics for cases when samples are not independent (and therefore are correlated), such as in the case of resampling, k-fold cross-validation, and repeated k-fold cross-validation. These corrections were all originally proposed by Nadeau and Bengio (2003). Currently, only cases where two models are to be compared are supported.

Python version

-

A Python version of correctR called -correctipy is available at the GitHub -repository.

+

A Python version of correctR called correctipy is available at the GitHub repository.

@@ -168,8 +137,7 @@

Dev status

diff --git a/docs/pkgdown.js b/docs/pkgdown.js index a1b8b6d..5fccd9c 100644 --- a/docs/pkgdown.js +++ b/docs/pkgdown.js @@ -70,7 +70,7 @@ /* Search marking --------------------------*/ var url = new URL(window.location.href); var toMark = url.searchParams.get("q"); - var mark = new Mark("div.col-md-9"); + var mark = new Mark("main#main"); if (toMark) { mark.mark(toMark, { accuracy: { diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 5c9773c..d3948e9 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -1,9 +1,9 @@ -pandoc: 2.19.2 -pkgdown: 2.0.2 +pandoc: 3.1.1 +pkgdown: 2.0.7 pkgdown_sha: ~ articles: correctR: correctR.html -last_built: 2023-01-27T00:07Z +last_built: 2023-08-20T10:11Z urls: reference: https://hendersontrent.github.io/correctR/reference article: https://hendersontrent.github.io/correctR/articles diff --git a/docs/reference/correctR.html b/docs/reference/correctR.html index b90e538..91922ab 100644 --- a/docs/reference/correctR.html +++ b/docs/reference/correctR.html @@ -1,5 +1,9 @@ +<<<<<<< HEAD +Corrections For Correlated Test Statistics — correctR • correctRCorrections For Correlated Test Statistics — correctR • correctR @@ -59,8 +63,12 @@ diff --git a/docs/reference/index.html b/docs/reference/index.html index a314cc8..cfd6fe6 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -1,5 +1,9 @@ +<<<<<<< HEAD +Function reference • correctRFunction reference • correctR @@ -56,7 +60,7 @@

All functionscorrectR + correctR correctR-package
Corrections For Correlated Test Statistics
@@ -83,8 +87,12 @@

All functions +<<<<<<< HEAD +

Site built with pkgdown 2.0.7.

+=======

Site built with pkgdown 2.0.2.

+>>>>>>> b4de758c9fd2b61f632e58cff96f46b8d8e30d63 diff --git a/docs/reference/kfold_ttest.html b/docs/reference/kfold_ttest.html index f341d1f..71c433a 100644 --- a/docs/reference/kfold_ttest.html +++ b/docs/reference/kfold_ttest.html @@ -1,5 +1,9 @@ +<<<<<<< HEAD +Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest • correctRCompute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest • correctR @@ -88,8 +92,12 @@

Author< diff --git a/docs/reference/repkfold_ttest.html b/docs/reference/repkfold_ttest.html index 53f469a..a54a9f4 100644 --- a/docs/reference/repkfold_ttest.html +++ b/docs/reference/repkfold_ttest.html @@ -1,5 +1,9 @@ +<<<<<<< HEAD +Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest • correctRCompute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest • correctR @@ -90,8 +94,12 @@

Author< diff --git a/docs/reference/resampled_ttest.html b/docs/reference/resampled_ttest.html index b018b6a..72f2214 100644 --- a/docs/reference/resampled_ttest.html +++ b/docs/reference/resampled_ttest.html @@ -1,5 +1,9 @@ +<<<<<<< HEAD +Compute correlated t-statistic and p-value for resampled data — resampled_ttest • correctRCompute correlated t-statistic and p-value for resampled data — resampled_ttest • correctR @@ -90,8 +94,12 @@

Author< diff --git a/docs/search.json b/docs/search.json index a5cd863..7a5761d 100644 --- a/docs/search.json +++ b/docs/search.json @@ -1 +1 @@ -[{"path":"https://hendersontrent.github.io/correctR/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 Trent Henderson Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Introduction to correctR","text":"correctR lightweight package implements small number corrected test statistics cases samples two machine learning model metrics (e.g., classification accuracy) independent (therefore correlated), case resampling \\(k\\)-fold cross-validation. demonstrate basic functionality using trivial examples following corrected tests currently implemented correctR: Random subsampling \\(k\\)-fold cross-validation Repeated \\(k\\)-fold cross-validation corrections originally proposed Nadeau Bengio (2003)1 additional representations Bouckaert Frank (2004)2.","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"random-subsampling-correction","dir":"Articles","previous_headings":"Introduction","what":"Random subsampling correction","title":"Introduction to correctR","text":"random subsampling, standard \\(t\\)-test inflates Type error used conjunction random subsampling due underestimation variance, found Dietterich (1998)3. Nadeau Bengio (2003) proposed solution (implement resampled_ttest correctR) form : \\[ t = \\frac{\\frac{1}{n} \\sum_{j=1}^{n}x_{j}}{\\sqrt{(\\frac{1}{n} + \\frac{n_{2}}{n_{1}})\\sigma^{2}}} \\] \\(n\\) number resamples (NOTE: \\(n\\) sample size), \\(n_{1}\\) number samples training data, \\(n_{2}\\) number samples test data. \\(\\sigma^{2}\\) variance estimate used standard paired \\(t\\)-test (simply \\(\\frac{\\sigma}{\\sqrt{n}}\\) denominator \\(n\\) sample size case).","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"k-fold-cross-validation-correction","dir":"Articles","previous_headings":"Introduction","what":"k-fold cross-validation correction","title":"Introduction to correctR","text":"alternate formulation random subsampling correction, devised terms unbiased estimator \\(\\rho\\), discussed Corani et al. (2016)4 implement kfold_tttest correctR: \\[ t = \\frac{\\frac{1}{n} \\sum_{j=1}^{n}x_{j}}{\\sqrt{(\\frac{1}{n} + \\frac{\\rho}{1-\\rho})\\sigma^{2}}} \\] \\(n\\) number resamples \\(\\rho = \\frac{1}{k}\\) \\(k\\) number folds \\(k\\)-fold cross-validation procedure. formulation stems fact Nadeau Bengio (2003) proved unbiased estimator, can approximated \\(\\rho = \\frac{1}{k}\\).","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"repeated-k-fold-cross-validation-correction","dir":"Articles","previous_headings":"Introduction","what":"Repeated k-fold cross-validation correction","title":"Introduction to correctR","text":"Repeated \\(k\\)-fold cross-validation complex previous case(s) now \\(r\\) repeats every fold \\(k\\). Bouckaert Frank (2004) present nice representation corrected test case implement repkfold_ttest correctR: \\[ t = \\frac{\\frac{1}{k \\cdot r} \\sum_{=1}^{k} \\sum_{j=1}^{r} x_{ij}}{\\sqrt{(\\frac{1}{k \\cdot r} + \\frac{n_{2}}{n_{1}})\\sigma^{2}}} \\]","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Introduction to correctR","text":"real world, proper results obtained fitting two models according one procedures outlined . simplicity , just going simulate three datasets can get package functionality cleaner easier. going assume classification context generate classification accuracy values. values purposefully egregious—going (case random subsampling) just fix train set sample size (n1) 80 test set sample size (n2) 20, assume (using data) \\(k\\)-fold cross-validation correction numbers obtained method. , values important , corrections going apply next crucial. case repeated \\(k\\)-fold cross-validation, take note column names. data.frame pass repkfold_ttest can four columns specified , must contain least four exact corresponding names. function explicitly searches . : \"model\" — contains label two models compare \"values\" — numerical values performance metric (.e., classification accuracy) \"k\" — fold values correspond \"r\" — repeat fold values correspond ","code":"set.seed(123) # For reproducibility # Data for random subsampling and k-fold cross-validation corrections x <- stats::rnorm(30, mean = 0.6, sd = 0.1) y <- stats::rnorm(30, mean = 0.4, sd = 0.1) # Data for repeated k-fold cross-validation correction tmp <- data.frame(model = rep(c(1, 2), each = 60), values = c(stats::rnorm(60, mean = 0.6, sd = 0.1), stats::rnorm(60, mean = 0.4, sd = 0.1)), k = rep(c(1, 1, 2, 2), times = 15), r = rep(c(1, 2), times = 30))"},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"package-functionality","dir":"Articles","previous_headings":"","what":"Package functionality","title":"Introduction to correctR","text":"can fit corrections one-line functions: functions return data.frame two named columns: \"statistic\" (\\(t\\)-statistic) \"p.value\" (associated \\(p\\)-value), meaning can easily integrated complex machine pipelines. example resampled.t.test:","code":"rss <- resampled_ttest(x = x, y = y, n = 30, n1 = 80, n2 = 20) # Random subsampling kcv <- kfold_ttest(x = x, y = y, n = 100, k = 30) # k-fold cross-validation rkcv <- repkfold_ttest(data = tmp, n1 = 80, n2 = 20, k = 2, r = 2) # Repeated k-fold cross-validation print(rss) ## statistic p.value ## 1 2.407318 0.01132991"},{"path":"https://hendersontrent.github.io/correctR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Trent Henderson. Maintainer, author.","code":""},{"path":"https://hendersontrent.github.io/correctR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Henderson T (2023). correctR: Corrected Test Statistics Comparing Machine Learning Models Correlated Samples. R package version 0.1.3, https://hendersontrent.github.io/correctR/.","code":"@Manual{, title = {correctR: Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples}, author = {Trent Henderson}, year = {2023}, note = {R package version 0.1.3}, url = {https://hendersontrent.github.io/correctR/}, }"},{"path":"https://hendersontrent.github.io/correctR/index.html","id":"correctr-","dir":"","previous_headings":"","what":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","title":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","text":"Corrected test statistics comparing machine learning models correlated samples","code":""},{"path":"https://hendersontrent.github.io/correctR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","text":"can install stable version correctR CRAN: can install development version correctR GitHub:","code":"install.packages(\"correctR\") devtools::install_github(\"hendersontrent/theft\")"},{"path":"https://hendersontrent.github.io/correctR/index.html","id":"general-purpose","dir":"","previous_headings":"","what":"General purpose","title":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","text":"Often machine learning, want compare performance different models determine one statistically outperforms another. However, methods used (e.g., data resampling, k-fold cross-validation) obtain performance metrics (e.g., classification accuracy) violate assumptions traditional statistical tests t-test. purpose methods either aid generalisability findings (.e., quantification error produce multiple values model instead just one) optimise model hyperparameters. makes invaluable, unusable traditional tests, Dietterich (1998) found standard t-test underestimates variance, therefore driving high Type error. correctR lightweight package implements small number corrected test statistics cases samples independent (therefore correlated), case resampling, k-fold cross-validation, repeated k-fold cross-validation. corrections originally proposed Nadeau Bengio (2003). Currently, cases two models compared supported.","code":""},{"path":"https://hendersontrent.github.io/correctR/index.html","id":"python-version","dir":"","previous_headings":"","what":"Python version","title":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","text":"Python version correctR called correctipy available GitHub repository.","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/correctR.html","id":null,"dir":"Reference","previous_headings":"","what":"Corrections For Correlated Test Statistics — correctR","title":"Corrections For Correlated Test Statistics — correctR","text":"Corrections Correlated Test Statistics","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"Compute correlated t-statistic p-value k-fold cross-validated results","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"","code":"kfold_ttest(x, y, n, k)"},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"x numeric vector values model y numeric vector values model B n integer denoting total sample size k integer denoting number folds used k-fold","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"object class data.frame","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"Nadeau, C., Bengio, Y. Inference Generalization Error. Machine Learning 52, (2003). Corani, G., Benavoli, ., Demsar, J., Mangili, F., Zaffalon, M. Statistical comparison classifiers Bayesian hierarchical modelling. Machine Learning, 106, (2017).","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"Trent Henderson","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"Compute correlated t-statistic p-value repeated k-fold cross-validated results","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"","code":"repkfold_ttest(data, n1, n2, k, r)"},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"data data.frame values model model B repeated k-fold cross-validation. Four named columns expected: \"model\", \"values\", \"k\", \"k\" n1 integer denoting train set size n2 integer denoting test set size k integer denoting number folds used k-fold r integer denoting number repeats per fold","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"object class data.frame","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"Nadeau, C., Bengio, Y. Inference Generalization Error. Machine Learning 52, (2003). Bouckaert, R. R., Frank, E. Evaluating Replicability Significance Tests Comparing Learning Algorithms. Advances Knowledge Discovery Data Mining. PAKDD 2004. Lecture Notes Computer Science, 3056, (2004).","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"Trent Henderson","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"Compute correlated t-statistic p-value resampled data","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"","code":"resampled_ttest(x, y, n, n1, n2)"},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"x numeric vector values model y numeric vector values model B n integer denoting number repeat samples. Defaults length(x) n1 integer denoting train set size n2 integer denoting test set size","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"object class data.frame","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"Nadeau, C., Bengio, Y. Inference Generalization Error. Machine Learning 52, (2003). Bouckaert, R. R., Frank, E. Evaluating Replicability Significance Tests Comparing Learning Algorithms. Advances Knowledge Discovery Data Mining. PAKDD 2004. Lecture Notes Computer Science, 3056, (2004).","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"Trent Henderson","code":""}] +[{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Introduction to correctR","text":"correctR lightweight package implements small number corrected test statistics cases samples two machine learning model metrics (e.g., classification accuracy) independent (therefore correlated), case resampling \\(k\\)-fold cross-validation. demonstrate basic functionality using trivial examples following corrected tests currently implemented correctR: Random subsampling \\(k\\)-fold cross-validation Repeated \\(k\\)-fold cross-validation corrections originally proposed Nadeau Bengio (2003)1 additional representations Bouckaert Frank (2004)2.","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"random-subsampling-correction","dir":"Articles","previous_headings":"Introduction","what":"Random subsampling correction","title":"Introduction to correctR","text":"random subsampling, standard \\(t\\)-test inflates Type error used conjunction random subsampling due underestimation variance, found Dietterich (1998)3. Nadeau Bengio (2003) proposed solution (implement resampled_ttest correctR) form : \\[ t = \\frac{\\frac{1}{n} \\sum_{j=1}^{n}x_{j}}{\\sqrt{(\\frac{1}{n} + \\frac{n_{2}}{n_{1}})\\sigma^{2}}} \\] \\(n\\) number resamples (NOTE: \\(n\\) sample size), \\(n_{1}\\) number samples training data, \\(n_{2}\\) number samples test data. \\(\\sigma^{2}\\) variance estimate used standard paired \\(t\\)-test (simply \\(\\frac{\\sigma}{\\sqrt{n}}\\) denominator \\(n\\) sample size case).","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"k-fold-cross-validation-correction","dir":"Articles","previous_headings":"Introduction","what":"k-fold cross-validation correction","title":"Introduction to correctR","text":"alternate formulation random subsampling correction, devised terms unbiased estimator \\(\\rho\\), discussed Corani et al. (2016)4 implement kfold_tttest correctR: \\[ t = \\frac{\\frac{1}{n} \\sum_{j=1}^{n}x_{j}}{\\sqrt{(\\frac{1}{n} + \\frac{\\rho}{1-\\rho})\\sigma^{2}}} \\] \\(n\\) number resamples \\(\\rho = \\frac{1}{k}\\) \\(k\\) number folds \\(k\\)-fold cross-validation procedure. formulation stems fact Nadeau Bengio (2003) proved unbiased estimator, can approximated \\(\\rho = \\frac{1}{k}\\).","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"repeated-k-fold-cross-validation-correction","dir":"Articles","previous_headings":"Introduction","what":"Repeated k-fold cross-validation correction","title":"Introduction to correctR","text":"Repeated \\(k\\)-fold cross-validation complex previous case(s) now \\(r\\) repeats every fold \\(k\\). Bouckaert Frank (2004) present nice representation corrected test case implement repkfold_ttest correctR: \\[ t = \\frac{\\frac{1}{k \\cdot r} \\sum_{=1}^{k} \\sum_{j=1}^{r} x_{ij}}{\\sqrt{(\\frac{1}{k \\cdot r} + \\frac{n_{2}}{n_{1}})\\sigma^{2}}} \\]","code":""},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Introduction to correctR","text":"real world, proper results obtained fitting two models according one procedures outlined . simplicity , just going simulate three datasets can get package functionality cleaner easier. going assume classification context generate classification accuracy values. values purposefully egregious—going (case random subsampling) just fix train set sample size (n1) 80 test set sample size (n2) 20, assume (using data) \\(k\\)-fold cross-validation correction numbers obtained method. , values important , corrections going apply next crucial. case repeated \\(k\\)-fold cross-validation, take note column names. data.frame pass repkfold_ttest can four columns specified , must contain least four exact corresponding names. function explicitly searches . : \"model\" — contains label two models compare \"values\" — numerical values performance metric (.e., classification accuracy) \"k\" — fold values correspond \"r\" — repeat fold values correspond ","code":"set.seed(123) # For reproducibility # Data for random subsampling and k-fold cross-validation corrections x <- stats::rnorm(30, mean = 0.6, sd = 0.1) y <- stats::rnorm(30, mean = 0.4, sd = 0.1) # Data for repeated k-fold cross-validation correction tmp <- data.frame(model = rep(c(1, 2), each = 60), values = c(stats::rnorm(60, mean = 0.6, sd = 0.1), stats::rnorm(60, mean = 0.4, sd = 0.1)), k = rep(c(1, 1, 2, 2), times = 15), r = rep(c(1, 2), times = 30))"},{"path":"https://hendersontrent.github.io/correctR/articles/correctR.html","id":"package-functionality","dir":"Articles","previous_headings":"","what":"Package functionality","title":"Introduction to correctR","text":"can fit corrections one-line functions: functions return data.frame two named columns: \"statistic\" (\\(t\\)-statistic) \"p.value\" (associated \\(p\\)-value), meaning can easily integrated complex machine pipelines. example resampled.t.test:","code":"rss <- resampled_ttest(x = x, y = y, n = 30, n1 = 80, n2 = 20) # Random subsampling kcv <- kfold_ttest(x = x, y = y, n = 100, k = 30) # k-fold cross-validation rkcv <- repkfold_ttest(data = tmp, n1 = 80, n2 = 20, k = 2, r = 2) # Repeated k-fold cross-validation print(rss) ## statistic p.value ## 1 2.407318 0.01132991"},{"path":"https://hendersontrent.github.io/correctR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Trent Henderson. Maintainer, author.","code":""},{"path":"https://hendersontrent.github.io/correctR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Henderson T (2023). correctR: Corrected Test Statistics Comparing Machine Learning Models Correlated Samples. R package version 0.1.3, https://hendersontrent.github.io/correctR/.","code":"@Manual{, title = {correctR: Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples}, author = {Trent Henderson}, year = {2023}, note = {R package version 0.1.3}, url = {https://hendersontrent.github.io/correctR/}, }"},{"path":"https://hendersontrent.github.io/correctR/index.html","id":"correctr-","dir":"","previous_headings":"","what":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","title":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","text":"Corrected test statistics comparing machine learning models correlated samples","code":""},{"path":"https://hendersontrent.github.io/correctR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","text":"can install stable version correctR CRAN: can install development version correctR GitHub:","code":"install.packages(\"correctR\") devtools::install_github(\"hendersontrent/theft\")"},{"path":"https://hendersontrent.github.io/correctR/index.html","id":"general-purpose","dir":"","previous_headings":"","what":"General purpose","title":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","text":"Often machine learning, want compare performance different models determine one statistically outperforms another. However, methods used (e.g., data resampling, k-fold cross-validation) obtain performance metrics (e.g., classification accuracy) violate assumptions traditional statistical tests t-test. purpose methods either aid generalisability findings (.e., quantification error produce multiple values model instead just one) optimise model hyperparameters. makes invaluable, unusable traditional tests, Dietterich (1998) found standard t-test underestimates variance, therefore driving high Type error. correctR lightweight package implements small number corrected test statistics cases samples independent (therefore correlated), case resampling, k-fold cross-validation, repeated k-fold cross-validation. corrections originally proposed Nadeau Bengio (2003). Currently, cases two models compared supported.","code":""},{"path":"https://hendersontrent.github.io/correctR/index.html","id":"python-version","dir":"","previous_headings":"","what":"Python version","title":"Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples","text":"Python version correctR called correctipy available GitHub repository.","code":""},{"path":"https://hendersontrent.github.io/correctR/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 Trent Henderson Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/correctR.html","id":null,"dir":"Reference","previous_headings":"","what":"Corrections For Correlated Test Statistics — correctR","title":"Corrections For Correlated Test Statistics — correctR","text":"Corrections Correlated Test Statistics","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"Compute correlated t-statistic p-value k-fold cross-validated results","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"","code":"kfold_ttest(x, y, n, k)"},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"x numeric vector values model y numeric vector values model B n integer denoting total sample size k integer denoting number folds used k-fold","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"object class data.frame","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"Nadeau, C., Bengio, Y. Inference Generalization Error. Machine Learning 52, (2003). Corani, G., Benavoli, ., Demsar, J., Mangili, F., Zaffalon, M. Statistical comparison classifiers Bayesian hierarchical modelling. Machine Learning, 106, (2017).","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/kfold_ttest.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Compute correlated t-statistic and p-value for k-fold cross-validated results — kfold_ttest","text":"Trent Henderson","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"Compute correlated t-statistic p-value repeated k-fold cross-validated results","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"","code":"repkfold_ttest(data, n1, n2, k, r)"},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"data data.frame values model model B repeated k-fold cross-validation. Four named columns expected: \"model\", \"values\", \"k\", \"k\" n1 integer denoting train set size n2 integer denoting test set size k integer denoting number folds used k-fold r integer denoting number repeats per fold","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"object class data.frame","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"Nadeau, C., Bengio, Y. Inference Generalization Error. Machine Learning 52, (2003). Bouckaert, R. R., Frank, E. Evaluating Replicability Significance Tests Comparing Learning Algorithms. Advances Knowledge Discovery Data Mining. PAKDD 2004. Lecture Notes Computer Science, 3056, (2004).","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/repkfold_ttest.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Compute correlated t-statistic and p-value for repeated k-fold cross-validated results — repkfold_ttest","text":"Trent Henderson","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"Compute correlated t-statistic p-value resampled data","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"","code":"resampled_ttest(x, y, n, n1, n2)"},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"x numeric vector values model y numeric vector values model B n integer denoting number repeat samples. Defaults length(x) n1 integer denoting train set size n2 integer denoting test set size","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"object class data.frame","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"Nadeau, C., Bengio, Y. Inference Generalization Error. Machine Learning 52, (2003). Bouckaert, R. R., Frank, E. Evaluating Replicability Significance Tests Comparing Learning Algorithms. Advances Knowledge Discovery Data Mining. PAKDD 2004. Lecture Notes Computer Science, 3056, (2004).","code":""},{"path":"https://hendersontrent.github.io/correctR/reference/resampled_ttest.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Compute correlated t-statistic and p-value for resampled data — resampled_ttest","text":"Trent Henderson","code":""}] diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 0250767..573b4e4 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,12 +3,6 @@ https://hendersontrent.github.io/correctR/404.html - - https://hendersontrent.github.io/correctR/LICENSE-text.html - - - https://hendersontrent.github.io/correctR/LICENSE.html - https://hendersontrent.github.io/correctR/articles/correctR.html @@ -21,6 +15,12 @@ https://hendersontrent.github.io/correctR/index.html + + https://hendersontrent.github.io/correctR/LICENSE-text.html + + + https://hendersontrent.github.io/correctR/LICENSE.html + https://hendersontrent.github.io/correctR/reference/correctR.html diff --git a/man/correctR.Rd b/man/correctR.Rd index 4da11e1..ab59f3c 100644 --- a/man/correctR.Rd +++ b/man/correctR.Rd @@ -3,6 +3,7 @@ \docType{package} \name{correctR} \alias{correctR} +\alias{correctR-package} \title{Corrections For Correlated Test Statistics} \description{ Corrections For Correlated Test Statistics