Add qs as a format option #1121

kendonB · 2019-12-20T20:46:46Z

Prework

Read and abide by drake's code of conduct.
Search for duplicates among the existing issues, both open and closed.

Proposal

This may or may not get implemented in storr as a backend for all files, but it may be worth just doing this directly in drake as an option format = "qs" if it's easy enough?

richfitz/storr#104

The text was updated successfully, but these errors were encountered:

wlandau · 2019-12-20T23:27:31Z

Implementation would be easy (perhaps too easy). But in the interest of controlling the number of formats we add so things do not get out of hand, I would first like to understand more about the behavior of qs in practice. The benchmarks in the README do look promising, but they are only on one kind of data and on a very narrow size range. If we can cast a wider net and clearly describe the situations in which qs excels the most, we can support qs and offer a clear recommendation in the docs.

My own benchmarks so far are not that impressive. Perhaps we need larger and more complicated data for qs to really shine. That's the sort of thing I would like to know.

library(microbenchmark)
library(qs)
#> qs v0.20.1: better serialization of S4 objects, see 'ChangeLog'
x <- 1
microbenchmark(
  wb = writeBin(x, tempfile()),
  rf = saveRDS(x, tempfile(), compress = FALSE),
  qs = qsave(x, tempfile())
)
#> Unit: microseconds
#>  expr    min      lq     mean median      uq     max neval
#>    wb 28.759 30.8775 32.60451 32.296 33.7700  47.332   100
#>    rf 29.086 30.6370 33.01477 31.796 32.6530 116.532   100
#>    qs 45.168 47.2715 53.92454 48.364 50.1735 541.282   100
x <- runif(1e8)
microbenchmark(
  wb = writeBin(x, tempfile()),
  rf = saveRDS(x, tempfile(), compress = FALSE),
  qs = qsave(x, tempfile()),
  times = 1
)
#> Unit: milliseconds
#>  expr       min        lq      mean    median        uq       max neval
#>    wb  623.1701  623.1701  623.1701  623.1701  623.1701  623.1701     1
#>    rf  850.3506  850.3506  850.3506  850.3506  850.3506  850.3506     1
#>    qs 1827.9655 1827.9655 1827.9655 1827.9655 1827.9655 1827.9655     1

kendonB · 2019-12-21T19:02:15Z

I will note that part of the benefit of qs is the fast compression. I don't think saveRDS(x, tempfile(), compress = FALSE) is the right comparison. My use of qs in the wild has been quite impressive. Saving spatial formats for example.

Perhaps we could step back and consider an option that leverages rio before going into a format rabbit hole.

wlandau · 2019-12-21T21:46:16Z

Yeah, compression matters. In saveRDS(), you either have massive runtime or a massive file. qsave() seems to avoid both extremes. Perhaps a qs backend is easy to justify after all.

library(qs)
#> qs v0.20.1: better serialization of S4 objects, see 'ChangeLog'
library(pryr)
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp

x <- runif(1e7)
object_size(x)
#> 80 MB

rf <- tempfile()
rt <- tempfile()
ql <- tempfile()
qz <- tempfile()

system.time(saveRDS(x, rf, compress = FALSE))
#>    user  system elapsed 
#>   0.101   0.032   0.132

system.time(saveRDS(x, rt, compress = TRUE))
#>    user  system elapsed 
#>  10.290   0.008  10.339

system.time(qsave(x, ql, algorithm = "lz4"))
#>    user  system elapsed 
#>   0.187   0.052   0.239

system.time(qsave(x, qz, algorithm = "zstd"))
#>    user  system elapsed 
#>   0.193   0.040   0.233

file.size(rf) / 1e6
#> [1] 80.00003

file.size(rt) / 1e6
#> [1] 53.25956

file.size(ql) / 1e6
#> [1] 41.80436

file.size(qz) / 1e6
#> [1] 41.80436

^{Created on 2019-12-21 by the reprex package (v0.3.0)}

kendonB added the type: new feature label Dec 20, 2019

kendonB assigned wlandau Dec 20, 2019

wlandau added the topic: performance label Dec 20, 2019

wlandau closed this as completed in 678daaa Dec 22, 2019

wlandau mentioned this issue Dec 28, 2019

Global format for all targets #1124

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add qs as a format option #1121

Add qs as a format option #1121

kendonB commented Dec 20, 2019

wlandau commented Dec 20, 2019 •

edited

Loading

kendonB commented Dec 21, 2019

wlandau commented Dec 21, 2019 •

edited

Loading

Add qs as a format option #1121

Add qs as a format option #1121

Comments

kendonB commented Dec 20, 2019

Prework

Proposal

wlandau commented Dec 20, 2019 • edited Loading

kendonB commented Dec 21, 2019

wlandau commented Dec 21, 2019 • edited Loading

wlandau commented Dec 20, 2019 •

edited

Loading

wlandau commented Dec 21, 2019 •

edited

Loading