-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge overhead in adjacent_vertices
in a moderately sized project
#435
Comments
Ironically, if I change |
Looking at the traceback, I am less sure that #440 will solve this. Environments cannot be subsetted with vectors of indices, which means we would end up calling the equivalent of A different tack: rethink the pruning strategies.
I am not actually sure we need |
The loop that's the bottleneck also depends on |
Wow, things really speed up if suppressPackageStartupMessages(library(igraph))
n <- 20000
g <- sample_k_regular(n, k = n / 4, directed = TRUE)
v <- seq_len(n / 2)
igraph_options(return.vs.es = TRUE)
bench::mark(adjacent_vertices(g, v = v))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr
#> <chr> <bch:> <bch> <bch:> <bch> <dbl> <bch:byt> <dbl> <int>
#> 1 adjacent_vert… 1.9s 1.9s 1.9s 1.9s 0.526 3.73GB 18 1
#> # ... with 1 more variable: total_time <bch:tm>
igraph_options(return.vs.es = FALSE)
bench::mark(adjacent_vertices(g, v = v))
#> # A tibble: 1 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr
#> <chr> <bch:> <bch> <bch:> <bch> <dbl> <bch:byt> <dbl> <int>
#> 1 adjacent_vert… 191ms 192ms 192ms 192ms 5.21 382MB 0 3
#> # ... with 1 more variable: total_time <bch:tm> @gaborcsardi, will the
|
It will not be removed. |
|
Hi, not sure if that's the same issue or not. I have a large a plan (~5k targets). When I use default However if I use: make(myplan,
parallelism='clustermq', jobs=25,
verbose=4,
lazy_load=TRUE, pruning_strategy='memory', garbage_collection=TRUE,
caching='worker') It takes hours before clustermq starts spawning the workers. The log looks like this:
The bottleneck seem to be somewhere in:
It seems that Is there anything that can be done? |
It seems that decreasing the number of jobs improves things dramatically. |
Wow, that really is severe! What about fewer jobs for local processing? When you write Not sure I know where else to find the bottleneck. The traceback you mentioned should actually be called before any of the imports are processed. In any case, I am very interested in decreasing overhead for large numbers of targets. |
Wow, decreasing the number of jobs for imports helped a lot indeed. |
The plan I'm running is quite simple: I have a 20000 element list of smallish spatial data objects and I'm simply assigning each of the 20000 to a target.
There's a bottleneck in:
run_loop
->build_check_store
->prune_envir
->downstream_deps <- nonfile_target_dependencies(targets = downstream, ....)
->dependencies
->adjacent_vertices
->res <- lapply(res, function(x) create_vs(graph, x + 1))
In this example, the vast majority of the computation time seems to be taken up by this
lapply
inadjacent_vertices
. I wonder if it's possible to move this work up to an earlier stage and it not use the inefficient R loop that it ultimately depends on?The text was updated successfully, but these errors were encountered: