Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up vis_drake_graph() on large completed workflows #1098

Closed
3 tasks done
wlandau opened this issue Dec 8, 2019 · 2 comments
Closed
3 tasks done

Speed up vis_drake_graph() on large completed workflows #1098

wlandau opened this issue Dec 8, 2019 · 2 comments
Assignees

Comments

@wlandau
Copy link
Member

wlandau commented Dec 8, 2019

Prework

  • Read and abide by drake's code of conduct.
  • Search for duplicates among the existing issues, both open and closed.
  • Advanced users: verify that the bottleneck still persists in the current development version (i.e. remotes::install_github("ropensci/drake")) and mention the SHA-1 hash of the Git commit you install.

Description

build_times(), outdated(), and progress() are slowing down vis_drake_graph() for completed workflows, and I think I know how to make them super fast. Instead of individual interactions with the cache to get the metadata, we can use vectorized operations like config$cache$list(namespace = "meta") and config$cache$mget(..., namespace = "meta") to get all the old metadata at once.

Reproducible example

library(drake)
library(fs)
library(profile)
library(withr)

profile <- function(plan) {
  local_dir(dir_create(tempfile()))
  rprof <- "prof.rprof"
  pprof <- "prof.pprof"
  plan <- drake_plan(
    z = target(w, transform = map(w = !!seq_len(1e3)))
  )
  make(plan)
  config <- drake_config(plan)
  Rprof(filename = rprof)
  tmp <- vis_drake_graph(config)
  Rprof(NULL)
  data <- read_rprof(rprof)
  write_pprof(data, pprof)
  vis_pprof(pprof)
}

vis_pprof <- function(path, host = "localhost", port = NULL) {
  server <- sprintf("%s:%s", host, port %||% random_port())
  message("local pprof server: http://", server)
  args <- c("-http", server, path)
  if (on_windows()) {
    shell(paste(c("pprof", args), collapse = " "))
  } else {
    system2(jointprof::find_pprof(), args)
  }
}

random_port <- function(from = 49152L, to = 65355L) {
  sample(seq.int(from = from, to = to, by = 1L), size = 1L)
}

on_windows <- function() {
  tolower(Sys.info()["sysname"]) == "windows"
}

`%||%` <- function(x, y) {
  if (is.null(x) || length(x) <= 0) {
    y
  } else {
    x
  }
}

profile()

Benchmarks

Capture

@wlandau wlandau self-assigned this Dec 8, 2019
@wlandau
Copy link
Member Author

wlandau commented Dec 8, 2019

The graph above was from Windows 10. The results for Linux on the same hardware are similar. See below.

Screenshot_20191207_230548

@wlandau wlandau closed this as completed in d3063d7 Dec 8, 2019
@wlandau
Copy link
Member Author

wlandau commented Dec 8, 2019

Here is the flame graph after d3063d7.

Screenshot_20191208_090908

Turns out mget() is not really appropriate for getting old metadata in outdated() since we do not know how much metadata we need to read until we have already advanced through the graph. I did implement a speedup in outdated(), but it was minor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant