Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helper function to add random missingness #298

Closed
njtierney opened this issue Apr 5, 2022 · 1 comment · Fixed by #309
Closed

Add helper function to add random missingness #298

njtierney opened this issue Apr 5, 2022 · 1 comment · Fixed by #309
Milestone

Comments

@njtierney
Copy link
Owner

rather than needing to do something like:

x <- 1:10
x
#>  [1]  1  2  3  4  5  6  7  8  9 10
x[sample(x = length(x), size = 5)] <- NA
x
#>  [1] NA NA  3 NA NA  6  7  8  9 NA

add_n_na <- function(x, n_na){
  x[sample(x = vctrs::vec_size(x), size = n_na)] <- NA
  x
}

x <- 1:10
x
#>  [1]  1  2  3  4  5  6  7  8  9 10
add_n_na(x, 3)
#>  [1]  1  2  3  4 NA  6  7 NA NA 10

Created on 2022-04-05 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info  🇻🇺  ⏺️  🕰️   ───────────────────────────────────────────────────
#>  hash: flag: Vanuatu, record button, mantelpiece clock
#> 
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       macOS Big Sur 11.2.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_AU.UTF-8
#>  ctype    en_AU.UTF-8
#>  tz       Australia/Melbourne
#>  date     2022-04-05
#>  pandoc   2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.1.1)
#>  cli           3.2.0   2022-02-14 [1] CRAN (R 4.1.1)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.1.3)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.1.1)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.1.1)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.1.1)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.1.1)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.1.1)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.1)
#>  knitr         1.37    2021-12-16 [1] CRAN (R 4.1.1)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.1)
#>  magrittr      2.0.2   2022-01-26 [1] CRAN (R 4.1.1)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.1.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.1.1)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.1)
#>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.1.1)
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.1)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.2.1   2021-11-02 [1] CRAN (R 4.1.1)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.1.1)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.1)
#>  styler        1.6.2   2021-09-23 [1] CRAN (R 4.1.1)
#>  tibble        3.1.6   2021-11-07 [1] CRAN (R 4.1.1)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.1)
#>  xfun          0.30    2022-03-02 [1] CRAN (R 4.1.1)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.1.1)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@njtierney njtierney added this to the V0.7.0 milestone Oct 14, 2022
@njtierney
Copy link
Owner Author

set_prop_miss <- function(x, prop = 0.1) {
  x[sample(seq_along(x) <= prop * length(x))] <- NA
  x
}

set_n_miss <- function(x, n = 1) {
  x[sample(seq_along(x) <= n)] <- NA
  x
}

library(tidyverse)
library(naniar)

df <- tibble(
  x = rnorm(100),
  y = rpois(100, lambda = 5)
)

df
#> # A tibble: 100 × 2
#>         x     y
#>     <dbl> <int>
#>  1  1.53      4
#>  2  0.281     7
#>  3  0.609    10
#>  4  2.18      7
#>  5 -0.256     9
#>  6 -0.565     5
#>  7  0.827     4
#>  8 -0.878     4
#>  9 -1.14      8
#> 10  1.17      8
#> # … with 90 more rows

set_prop_miss(df$x, 0.1) %>% prop_miss()
#> [1] 0.1
set_prop_miss(df$x, 0.5) %>% prop_miss()
#> [1] 0.5
set_prop_miss(df$x, 0.75) %>% prop_miss()
#> [1] 0.75

set_n_miss(df$x, 2) %>% n_miss()
#> [1] 2
set_n_miss(df$x, 10) %>% n_miss()
#> [1] 10
set_n_miss(df$x, 50) %>% n_miss()
#> [1] 50

Created on 2023-01-31 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Monterey 12.3.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Australia/Hobart
#>  date     2023-01-31
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version    date (UTC) lib source
#>  assertthat      0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports       1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  broom           1.0.2      2022-12-15 [1] CRAN (R 4.2.0)
#>  cellranger      1.1.0      2016-07-27 [1] CRAN (R 4.2.0)
#>  cli             3.6.0      2023-01-09 [1] CRAN (R 4.2.0)
#>  colorspace      2.1-0      2023-01-23 [1] CRAN (R 4.2.0)
#>  crayon          1.5.2      2022-09-29 [1] CRAN (R 4.2.0)
#>  DBI             1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dbplyr          2.3.0      2023-01-16 [1] CRAN (R 4.2.0)
#>  digest          0.6.31     2022-12-11 [1] CRAN (R 4.2.0)
#>  dplyr         * 1.1.0      2023-01-29 [1] CRAN (R 4.2.1)
#>  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate        0.20       2023-01-17 [1] CRAN (R 4.2.0)
#>  fansi           1.0.4      2023-01-22 [1] CRAN (R 4.2.0)
#>  fastmap         1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  forcats       * 0.5.2      2022-08-19 [1] CRAN (R 4.2.0)
#>  fs              1.6.0      2023-01-23 [1] CRAN (R 4.2.0)
#>  gargle          1.2.1      2022-09-08 [1] CRAN (R 4.2.0)
#>  generics        0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2       * 3.4.0      2022-11-04 [1] CRAN (R 4.2.0)
#>  glue            1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  googledrive     2.0.0      2021-07-08 [1] CRAN (R 4.2.0)
#>  googlesheets4   1.0.1      2022-08-13 [1] CRAN (R 4.2.0)
#>  gtable          0.3.1      2022-09-01 [1] CRAN (R 4.2.0)
#>  haven           2.5.1      2022-08-22 [1] CRAN (R 4.2.0)
#>  hms             1.1.2      2022-08-19 [1] CRAN (R 4.2.0)
#>  htmltools       0.5.4      2022-12-07 [1] CRAN (R 4.2.0)
#>  httr            1.4.4      2022-08-17 [1] CRAN (R 4.2.0)
#>  jsonlite        1.8.4      2022-12-06 [1] CRAN (R 4.2.0)
#>  knitr           1.41.9     2023-01-20 [1] https://yihui.r-universe.dev (R 4.2.2)
#>  lifecycle       1.0.3      2022-10-07 [1] CRAN (R 4.2.0)
#>  lubridate       1.9.1      2023-01-24 [1] CRAN (R 4.2.0)
#>  magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  modelr          0.1.10     2022-11-11 [1] CRAN (R 4.2.0)
#>  munsell         0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  naniar        * 0.6.1.9001 2023-01-31 [1] local
#>  pillar          1.8.1      2022-08-19 [1] CRAN (R 4.2.0)
#>  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         * 1.0.1      2023-01-10 [1] CRAN (R 4.2.0)
#>  R.cache         0.16.0     2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3     1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo            1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils         2.12.2     2022-11-11 [1] CRAN (R 4.2.0)
#>  R6              2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  readr         * 2.1.3      2022-10-01 [1] CRAN (R 4.2.0)
#>  readxl          1.4.1      2022-08-17 [1] CRAN (R 4.2.0)
#>  reprex          2.0.2      2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang           1.0.6      2022-09-24 [1] CRAN (R 4.2.0)
#>  rmarkdown       2.20       2023-01-19 [1] CRAN (R 4.2.0)
#>  rstudioapi      0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  rvest           1.0.3      2022-08-19 [1] CRAN (R 4.2.0)
#>  scales          1.2.1      2022-08-20 [1] CRAN (R 4.2.0)
#>  sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi         1.7.12     2023-01-11 [1] CRAN (R 4.2.0)
#>  stringr       * 1.5.0      2022-12-02 [1] CRAN (R 4.2.0)
#>  styler          1.9.0      2023-01-15 [1] CRAN (R 4.2.0)
#>  tibble        * 3.1.8      2022-07-22 [1] CRAN (R 4.2.0)
#>  tidyr         * 1.3.0      2023-01-24 [1] CRAN (R 4.2.0)
#>  tidyselect      1.2.0      2022-10-10 [1] CRAN (R 4.2.0)
#>  tidyverse     * 1.3.2      2022-07-18 [1] CRAN (R 4.2.0)
#>  timechange      0.2.0      2023-01-11 [1] CRAN (R 4.2.0)
#>  tzdb            0.3.0      2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8            1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs           0.5.2      2023-01-23 [1] CRAN (R 4.2.0)
#>  visdat          0.6.0.9000 2022-12-13 [1] local
#>  withr           2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun            0.36       2022-12-21 [1] CRAN (R 4.2.0)
#>  xml2            1.3.3      2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml            2.3.7      2023-01-23 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Note that add_n_miss and add_prop_miss are already functions for adding helper columns on the proportion and number of missing values to a dataset

njtierney added a commit that referenced this issue Jan 31, 2023
- resolves #298
- added imports, `vctrs` and `cli` - which are both free dependencies as they are used within the already used tidyverse already.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant