Huge overhead in `adjacent_vertices` in a moderately sized project #435

kendonB · 2018-06-29T03:29:11Z

The plan I'm running is quite simple: I have a 20000 element list of smallish spatial data objects and I'm simply assigning each of the 20000 to a target.

There's a bottleneck in:
run_loop ->
build_check_store ->
prune_envir ->
downstream_deps <- nonfile_target_dependencies(targets = downstream, ....) ->
dependencies ->
adjacent_vertices ->
res <- lapply(res, function(x) create_vs(graph, x + 1))

In this example, the vast majority of the computation time seems to be taken up by this lapply in adjacent_vertices. I wonder if it's possible to move this work up to an earlier stage and it not use the inefficient R loop that it ultimately depends on?

The text was updated successfully, but these errors were encountered:

kendonB · 2018-06-29T03:53:03Z

Ironically, if I change pruning_strategy from speed to memory, my plan makes much faster (maybe 4-5x).

wlandau · 2018-06-29T13:18:52Z

Since adjacent_vertices() is part of igraph, the solution will not be easy. This brings up and clarifies some thoughts I have had in the back of my mind about streamlining drake's internals. I will work on #440 first.

wlandau · 2018-07-04T18:30:22Z

Looking at the traceback, I am less sure that #440 will solve this. Environments cannot be subsetted with vectors of indices, which means we would end up calling the equivalent of adjacent_vertices() using vectorization or an lapply()-like function over the large downstream argument to prune_envir(). Any parallelism we add would not be available in build_check_store() because it is already being applied in parallel.

A different tack: rethink the pruning strategies.

speed: just load the current target's dependencies and do not worry about anything downstream. Candidate for the new default.
lookahead: Look at all the downstream targets and unload anything they won't need. This is the current default, and it is currently named "speed".
memory: for each target, load the dependencies and unload everything else.

I am not actually sure we need lookahead pruning, and it certainly complicates the code, but we should probably keep it for now in case some projects need it.

kendonB · 2018-07-04T20:38:41Z

The loop that's the bottleneck also depends on igraph_opt("return.vs.es") being TRUE. Are you sure you actually need this option?

wlandau · 2018-07-04T21:44:21Z

Wow, things really speed up if igraph_opt("return.vs.es") is FALSE!

suppressPackageStartupMessages(library(igraph))
n <- 20000
g <- sample_k_regular(n, k = n / 4, directed = TRUE)
v <- seq_len(n / 2)
igraph_options(return.vs.es = TRUE)
bench::mark(adjacent_vertices(g, v = v))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 x 10
#>   expression        min  mean median   max `itr/sec` mem_alloc  n_gc n_itr
#>   <chr>          <bch:> <bch> <bch:> <bch>     <dbl> <bch:byt> <dbl> <int>
#> 1 adjacent_vert…   1.9s  1.9s   1.9s  1.9s     0.526    3.73GB    18     1
#> # ... with 1 more variable: total_time <bch:tm>
igraph_options(return.vs.es = FALSE)
bench::mark(adjacent_vertices(g, v = v))
#> # A tibble: 1 x 10
#>   expression        min  mean median   max `itr/sec` mem_alloc  n_gc n_itr
#>   <chr>          <bch:> <bch> <bch:> <bch>     <dbl> <bch:byt> <dbl> <int>
#> 1 adjacent_vert…  191ms 192ms  192ms 192ms      5.21     382MB     0     3
#> # ... with 1 more variable: total_time <bch:tm>

@gaborcsardi, will the return.vs.es option stick around for the long term, or will it eventually be deprectated? From the igraph_options help file:

return.vs.es

Whether functions that return a set or sequence of vertices/edges should return formal vertex/edge sequence objects. This option was introduced in igraph version 1.0.0 and defaults to TRUE. If your package requires the old behavior, you can set it to FALSE in the .onLoad function of your package, without affecting other packages.

gaborcsardi · 2018-07-04T21:50:39Z

It will not be removed.

wlandau · 2018-07-04T22:13:38Z

Fantastic! @kendonB, would you try 5cfd7e9?

wlandau · 2018-07-06T02:06:22Z

pruning_strategy now has 3 choices, all documented in ?config and ?make. In case igraph_options(return.vs.es = FALSE) is not enough, make(pruning_strategy = "speed") avoids dependency lookups as much as possible. Even without #440, we have covered all our bases.

"lookahead" (default): keep loaded targets in memory until they are no longer needed as dependencies in downstream build steps. Then, unload them from the environment. This step avoids keeping unneeded data in memory and minimizes expensive reads from the cache. However, it requires looking ahead in the dependency graph, which could add overhead for every target of projects with lots of targets.
"speed": Once a target is loaded in memory, just keep it there. Maximizes speed, but hogs memory.
"memory": For each target, unload everything from memory except the target's direct dependencies. Conserves memory, but sacrifices speed because each new target needs to reload any previously unloaded targets from the cache.

idavydov · 2018-10-18T12:03:52Z

Hi, not sure if that's the same issue or not.

I have a large a plan (~5k targets). When I use default make(), it takes a couple of minutes before the first job is submitted.

However if I use:

make(myplan,
      parallelism='clustermq', jobs=25,
      verbose=4,
      lazy_load=TRUE, pruning_strategy='memory', garbage_collection=TRUE,
      caching='worker')

It takes hours before clustermq starts spawning the workers.

The log looks like this:

cache /...
analyze environment
analyze 99 imports: a, b, c, d...
construct graph
import a
import b
import c
(and after all the imports it just stops)

The bottleneck seem to be somewhere in:

"mcfork" "FUN" "lapply" "parallel::mclapply" "safe_mclapply" "lightly_parallelize_atomic" "lightly_parallelize" "eval" "eval" "%>%" "bdg_analyze_imports" "force" "memo_expr" "build_drake_graph" "drake_config" "make"

It seems that prunning_strategy='speed' does not help much.

Is there anything that can be done?

idavydov · 2018-10-18T12:14:58Z

It seems that decreasing the number of jobs improves things dramatically.
Could it be that drake forks jobs number of times and overloads the host? Is it possible to get around this?

wlandau · 2018-10-18T12:22:55Z

Wow, that really is severe! What about fewer jobs for local processing? When you write jobs = c(imports = 4, targets = 25), 4 jobs will also be used for preprocessing tasks such as graph construction (so maybe "local" and "remote" might have been better names than "imports" and "targets). jobs = 25 is the same as jobs = c(imports = 25, targets = 25), so I think this could indeed be overwhelming the host.

Not sure I know where else to find the bottleneck. The traceback you mentioned should actually be called before any of the imports are processed.

In any case, I am very interested in decreasing overhead for large numbers of targets.

idavydov · 2018-10-18T12:36:07Z

Wow, decreasing the number of jobs for imports helped a lot indeed.
Thanks!

wlandau mentioned this issue Jun 29, 2018

A new internal fast-lookup data structure for target information #440

Closed

wlandau added difficulty: advanced topic: performance component: internals status: priority and removed topic: internals status: priority labels Jun 29, 2018

kendonB mentioned this issue Jul 3, 2018

future_lapply workers don't start working right away #449

Closed

wlandau mentioned this issue Jul 3, 2018

"future" parallelism takes a really long time to get going when many upstream targets are up to date #448

Closed

wlandau added status: priority and removed depends: #440 🕟 difficulty: advanced labels Jul 4, 2018

wlandau closed this as completed in 915a0ca Jul 6, 2018

wlandau mentioned this issue Jul 6, 2018

possible new bug in make_targets #460

Closed

wlandau mentioned this issue Oct 18, 2018

Change interpretation of the jobs argument? #553

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge overhead in `adjacent_vertices` in a moderately sized project #435

Huge overhead in `adjacent_vertices` in a moderately sized project #435

kendonB commented Jun 29, 2018

kendonB commented Jun 29, 2018

wlandau commented Jun 29, 2018 •

edited

Loading

wlandau commented Jul 4, 2018 •

edited

Loading

kendonB commented Jul 4, 2018

wlandau commented Jul 4, 2018

gaborcsardi commented Jul 4, 2018

wlandau commented Jul 4, 2018

wlandau commented Jul 6, 2018

idavydov commented Oct 18, 2018

idavydov commented Oct 18, 2018 •

edited

Loading

wlandau commented Oct 18, 2018

idavydov commented Oct 18, 2018 •

edited

Loading

Huge overhead in adjacent_vertices in a moderately sized project #435

Huge overhead in adjacent_vertices in a moderately sized project #435

Comments

kendonB commented Jun 29, 2018

kendonB commented Jun 29, 2018

wlandau commented Jun 29, 2018 • edited Loading

wlandau commented Jul 4, 2018 • edited Loading

kendonB commented Jul 4, 2018

wlandau commented Jul 4, 2018

gaborcsardi commented Jul 4, 2018

wlandau commented Jul 4, 2018

wlandau commented Jul 6, 2018

idavydov commented Oct 18, 2018

idavydov commented Oct 18, 2018 • edited Loading

wlandau commented Oct 18, 2018

idavydov commented Oct 18, 2018 • edited Loading

Huge overhead in `adjacent_vertices` in a moderately sized project #435

Huge overhead in `adjacent_vertices` in a moderately sized project #435

wlandau commented Jun 29, 2018 •

edited

Loading

wlandau commented Jul 4, 2018 •

edited

Loading

idavydov commented Oct 18, 2018 •

edited

Loading

idavydov commented Oct 18, 2018 •

edited

Loading