Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add qs as a format option #1121

Closed
2 tasks done
kendonB opened this issue Dec 20, 2019 · 3 comments
Closed
2 tasks done

Add qs as a format option #1121

kendonB opened this issue Dec 20, 2019 · 3 comments

Comments

@kendonB
Copy link
Contributor

kendonB commented Dec 20, 2019

Prework

Proposal

This may or may not get implemented in storr as a backend for all files, but it may be worth just doing this directly in drake as an option format = "qs" if it's easy enough?

richfitz/storr#104

@wlandau
Copy link
Member

wlandau commented Dec 20, 2019

Implementation would be easy (perhaps too easy). But in the interest of controlling the number of formats we add so things do not get out of hand, I would first like to understand more about the behavior of qs in practice. The benchmarks in the README do look promising, but they are only on one kind of data and on a very narrow size range. If we can cast a wider net and clearly describe the situations in which qs excels the most, we can support qs and offer a clear recommendation in the docs.

My own benchmarks so far are not that impressive. Perhaps we need larger and more complicated data for qs to really shine. That's the sort of thing I would like to know.

library(microbenchmark)
library(qs)
#> qs v0.20.1: better serialization of S4 objects, see 'ChangeLog'
x <- 1
microbenchmark(
  wb = writeBin(x, tempfile()),
  rf = saveRDS(x, tempfile(), compress = FALSE),
  qs = qsave(x, tempfile())
)
#> Unit: microseconds
#>  expr    min      lq     mean median      uq     max neval
#>    wb 28.759 30.8775 32.60451 32.296 33.7700  47.332   100
#>    rf 29.086 30.6370 33.01477 31.796 32.6530 116.532   100
#>    qs 45.168 47.2715 53.92454 48.364 50.1735 541.282   100
x <- runif(1e8)
microbenchmark(
  wb = writeBin(x, tempfile()),
  rf = saveRDS(x, tempfile(), compress = FALSE),
  qs = qsave(x, tempfile()),
  times = 1
)
#> Unit: milliseconds
#>  expr       min        lq      mean    median        uq       max neval
#>    wb  623.1701  623.1701  623.1701  623.1701  623.1701  623.1701     1
#>    rf  850.3506  850.3506  850.3506  850.3506  850.3506  850.3506     1
#>    qs 1827.9655 1827.9655 1827.9655 1827.9655 1827.9655 1827.9655     1

@kendonB
Copy link
Contributor Author

kendonB commented Dec 21, 2019

I will note that part of the benefit of qs is the fast compression. I don't think saveRDS(x, tempfile(), compress = FALSE) is the right comparison. My use of qs in the wild has been quite impressive. Saving spatial formats for example.

Perhaps we could step back and consider an option that leverages rio before going into a format rabbit hole.

@wlandau
Copy link
Member

wlandau commented Dec 21, 2019

Yeah, compression matters. In saveRDS(), you either have massive runtime or a massive file. qsave() seems to avoid both extremes. Perhaps a qs backend is easy to justify after all.

library(qs)
#> qs v0.20.1: better serialization of S4 objects, see 'ChangeLog'
library(pryr)
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp

x <- runif(1e7)
object_size(x)
#> 80 MB

rf <- tempfile()
rt <- tempfile()
ql <- tempfile()
qz <- tempfile()

system.time(saveRDS(x, rf, compress = FALSE))
#>    user  system elapsed 
#>   0.101   0.032   0.132

system.time(saveRDS(x, rt, compress = TRUE))
#>    user  system elapsed 
#>  10.290   0.008  10.339

system.time(qsave(x, ql, algorithm = "lz4"))
#>    user  system elapsed 
#>   0.187   0.052   0.239

system.time(qsave(x, qz, algorithm = "zstd"))
#>    user  system elapsed 
#>   0.193   0.040   0.233

file.size(rf) / 1e6
#> [1] 80.00003

file.size(rt) / 1e6
#> [1] 53.25956

file.size(ql) / 1e6
#> [1] 41.80436

file.size(qz) / 1e6
#> [1] 41.80436

Created on 2019-12-21 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants