High memory usage in large pipelines #1352

wlandau · 2024-10-22T00:24:23Z

wlandau
Oct 22, 2024
Maintainer

c.f. #1347 and #1329. I tried the following pipeline on a RHEL9 node:

library(autometric)
library(crew)
library(targets)

controller <- crew_controller_local(
  workers = 1L,
  garbage_collection = TRUE,
  options_metrics = crew_options_metrics(
    path = "logs",
    seconds_interval = 1
  )
)

if (tar_active()) {
  controller$start()
  log_start(
    path = "logs/main.txt",
    seconds = 1,
    pids = controller$pids()
  )
}

tar_option_set(
  memory = "transient",
  garbage_collection = TRUE,
  controller = controller
)

write_file <- function(x) {
  fs::dir_create("files")
  path <- file.path("files", paste0(x, ".rds"))
  saveRDS(x, path)
  path
}

list(
  tar_target(x, seq_len(2e4)),
  tar_target(y, write_file(x), pattern = map(x), format = "file"),
  tar_target(z, readRDS(y), pattern = map(y))
)

Then I read and visualized the autometric logs:

library(autometric)
log <- log_read("logs", units_memory = "megabytes")
names <- unique(log$name)
log_plot(log, name = names[1], metric = "resident")
log_plot(log, name = names[2], metric = "resident")
log_plot(log, name = names[3], metric = "resident")

The crew worker and mirai dispatcher are efficient with memory, consuming no more than a few megabytes. But the memory consumption of the local targets process kept increasing without an ostensible bound. 3 GB isn't necessarily alarming, but I will need to look into what is responsible for most of this memory.

I wonder if this could explain #1347 or #1329, and I wonder what would happen without crew.

wlandau · 2024-10-22T12:46:30Z

wlandau
Oct 22, 2024
Maintainer Author

I tried a similar pipeline without crew:

library(autometric)
library(targets)

if (tar_active()) {
  log_start(
    path = "logs/main.txt",
    seconds = 1
  )
}

tar_option_set(
  memory = "transient",
  garbage_collection = TRUE
)

write_file <- function(x) {
  fs::dir_create("files")
  path <- file.path("files", paste0(x, ".rds"))
  saveRDS(x, path)
  path
}

list(
  tar_target(x, seq_len(2e4)),
  tar_target(y, write_file(x), pattern = map(x), format = "file"),
  tar_target(z, readRDS(y), pattern = map(y))
)

The pipeline took a lot longer to run (~7 hr), but memory usage looked more reasonable:

There is a mild surge at the beginning, a mild surge at around 10000s (presumably when all the dynamic branches of z are defined) and then another mild surge at the end. A max of 800 MB is pretty good.

Takeaways:

Something about crew + targets guzzles memory.
Something about targets alone is slow for this type of pipeline, and the slowness does not appear to have anything to do with crew or (1).

So we actually have 2 different unrelated performance problems.

0 replies

wlandau · 2024-10-22T14:52:24Z

wlandau
Oct 22, 2024
Maintainer Author

For (2), the slowness just comes from garbage collection 😆 . I should have known.

library(targets)

tar_option_set(
  memory = "transient",
  garbage_collection = TRUE
)

write_file <- function(x) {
  fs::dir_create("files")
  path <- file.path("files", paste0(x, ".rds"))
  saveRDS(x, path)
  path
}

list(
  tar_target(x, seq_len(1000)),
  tar_target(y, write_file(x), pattern = map(x), format = "file"),
  tar_target(z, readRDS(y), pattern = map(y))
)

library(proffer)
library(targets)
tar_destroy()
pprof(tar_make(callr_function = NULL, reporter = "summary"))

0 replies

wlandau · 2024-10-22T20:11:26Z

wlandau
Oct 22, 2024
Maintainer Author

As best I can tell for now, most of the memory is consumed by the internal data structures targets needs for bookkeeping. targets has an internal object oriented programming system which uses environments with S3 classes. With 32k targets, there are 32k+ nested environments, and those happen to take up a lot of memory in aggregate. Unless I am missing something in scaled-up examples, improving memory efficiency here would be a huge undertaking and may involve converting many of the internal data structures into compact C structs. Converting this thread to a discussion.

8 replies

wlandau Oct 25, 2024
Maintainer Author

With garbage collection, the pipeline ends up not parallelising much

I think this is because gc() creates intermittent delays which allow quick targets to run before new ones have a chance to dispatch. In this scenario, crew judges that it doesn't need very many workers because the main process can't keep up with how fast the actual tasks are running.

Refactoring my pipeline to reduce the number of branches seems like the most practical solution for now.

Yes, that's exactly why tarchetypes functions like tar_rep(), tar_map_rep(), and most target factories for grouped data frames use batching. Even if/when targets reduces its memory footprint for large pipelines, batching will still be a good strategy that reduces overhead.

wlandau Oct 31, 2024
Maintainer Author

with 24K input files, and 5, 6 operation it pushes the total number of targets to 100K+

With 100K+ targets, you're probably pushing the limits of what the package can currently handle. I hope batching works in your use case.

kyle-messier Nov 7, 2024

with 24K input files, and 5, 6 operation it pushes the total number of targets to 100K+

With 100K+ targets, you're probably pushing the limits of what the package can currently handle. I hope batching works in your use case.

Thanks for this discussion, which is helpful as we are using targets to create large pipelines. Is your 100K targets based assessment for the total number of targets or for targets that you'd expect to dispatched at the same time? We are using branching to break up large raster and other spatial data, so I have a handful of targets in the 10K range with the total pipeline over 100K. Besides the long spin-up time if I don't specify tar_make(shortcut = T) or tar_make(names = x) I haven't noticed major issues with crew handling the performance and memory.

stemangiola Nov 8, 2024

100K is the total number of targets. Could you expand this "branching to break up large raster and other spatial data, so I have a handful of targets in the 10K range with the total pipeline over 100K.". I am not sure I follow.

kyle-messier Nov 8, 2024

@stemangiola Processing raster calculations based on splitting by a polygon and/or grid. The polygon split is a mapping variable for dynamic branching.

wlandau · 2024-10-25T14:11:20Z

wlandau
Oct 25, 2024
Maintainer Author

To the broader discussion: here is a rough sketch of how targets generally represents an object produced by tar_target(). I intentionally omitted most of the elements, including the more complicated ones like expressions.

current <- list2env(
  list(
    command = list2env(
      list(
        packages = c("dplyr", "tibble", "ggplot2"),
        string = "expression(1)",
        seed = -1813454154L,
        hash = "bb375a30fa348382"
      )
    ),
    settings = list2env(
      list(
        name = "x", description = character(0), format = "rds", 
        repository = "local", pattern = NULL, dimensions = character(0), 
        iteration = "vector", error = "stop", memory = "persistent", 
        garbage_collection = FALSE, deployment = "worker", priority = 0, 
        storage = "main", retrieval = "main"
      )
    ),
    cue = list2env(
      list(
        mode = "thorough", command = TRUE, depend = TRUE, format = TRUE, 
        repository = TRUE, iteration = TRUE, file = TRUE, seed = TRUE
      )
    )
  )
)

The size of this object is 5432 bytes.

library(lobstr)
as.numeric(obj_size(current))
#> [1] 5432

Here is a flattened list representation of all the elements, with comments to show how many bytes it would take to store each object in C.

flat <- list(
  # 18 regular characters +
  #   3 '\0' characters +
  #   3 * sizeof(char*) +
  #   sizeof(char**) =
  # 18 + 3 + 3 * 8 + 8 =
  # 53 bytes
  packages = c("dplyr", "tibble", "ggplot2"),
  # 13 regular characters +
  #   1 '\0' character +
  #   sizeof(char*) =
  # 13 + 1 + 8 =
  # 22 bytes
  string = "expression(1)",
  # 4 bytes
  seed = -1813454154L,
  # 16 + 1 + 8 =
  # 25 bytes
  hash = "bb375a30fa348382",
  # 1 + 1 + 8 =
  # 10 bytes
  name = "x",
  # 8 bytes
  description = character(0),
  # 12 bytes
  format = "rds",
  # 14 bytes
  repository = "local",
  # 8 bytes
  pattern = NULL,
  # 8 bytes
  dimensions = character(0),
  # 15 bytes
  iteration = "vector",
  # 13 bytes
  error = "stop",
  # 19 bytes
  memory = "persistent",
  # 1 byte
  garbage_collection = FALSE,
  # 15 bytes
  deployment = "worker",
  # 8 bytes
  priority = 0,
  # 13 bytes
  storage = "main",
  # 13 bytes
  retrieval = "main",
  # 17 bytes
  mode = "thorough",
  # 1 byte
  command = TRUE,
  # 1 byte
  depend = TRUE,
  # 1 byte
  format = TRUE, 
  # 1 byte
  repository = TRUE,
  # 1 byte
  iteration = TRUE,
  # 1 byte
  file = TRUE,
  # 1 byte
  seed = TRUE
)

The R representations of the individual elements of flat sum to 2296 bytes, about 42.3% of the size of current.

sum(vapply(flat, obj_size, FUN.VALUE = numeric(1L)))
#> [1] 2296

Summing all the alleged C storage sizes in the comments, we get 285 bytes. If represented in a C struct, the total size would actually be larger because of alignment and padding: for now, I'll guess around 100 extra bytes for padding, totaling 385 bytes (about 7% of ‘current’).

Interestingly, obj_size(qs::qserialize(current)) is 456 bytes, about 8.4% of the size of the original object. So we could potentially save more storage space by just serializing target definition objects when they are not actively in use. This may or may not come with a performance penalty, but initial benchmarks look promising, with serialization times in the low microseconds.

microbenchmark::microbenchmark(qs::qserialize(current))
#> Unit: microseconds
#>                     expr    min     lq     mean median     uq     max neval
#>  qs::qserialize(current) 54.407 57.728 69.40685 63.509 70.151 243.745   100

Using the full pipeline below:

library(targets)
library(tibble)
list(
  tar_target(data, tibble(x = seq_len(1e4))),
  tar_target(slice, data, pattern = map(data))
)

and taking the target definition object for the pattern target data with its branches defined mid-pipeline, the size of the object is 1.77 MB:

as.numeric(obj_size(target))
#> [1] 1771336

qserialize() with the default settings reduces its size to 199.06 KB, ~11.2% of the size of the original object.

as.numeric(obj_size(qs::qserialize(target)))
#> 199064

Serialization times are in the low milliseconds:

> microbenchmark::microbenchmark(qs::qserialize(target))
Unit: milliseconds
                   expr      min       lq   mean
 qs::qserialize(target) 2.872583 2.900934 2.9433
  median       uq      max neval
 2.91633 2.961143 3.188201   100

Of course most targets will be branches, not patterns. A single branch is about 10.56 KB and serializes to 512 B (5% of the original size) and serialization times are about 50-55 microseconds.

Unfortunately qs is phasing out and qs2 does not yet have an equivalent of qserialize(), but it's very new, so it may yet happen.

1 reply

wlandau Oct 25, 2024
Maintainer Author

Just posted qsbase/qs2#4

wlandau · 2024-10-31T22:02:31Z

wlandau
Oct 31, 2024
Maintainer Author

On a closer look, .Internal(inspect(tar_target(x, 1)) reveals a lot of unevaluated promise objects. That's probably why qs::qdeserialize(qs::qserialize(tar_target(x, 1))) reduces memory so much. I think I can get memory usage way down by evaluating those promises. I just need to be careful not to increase computation time in the process.

3 replies

wlandau Nov 1, 2024
Maintainer Author

Here is an analysis of 4 different ways to create the kind of classed environment object that targets uses. There is a clear winner in terms of memory and speed. targets currently uses cue_new1(), and it should use cue_new2().

enclose <- function(x, class) {
  class(x) <- c(class, class(x))
  x
}

cue_new1 <- function(
  mode = NULL,
  command = NULL,
  depend = NULL,
  format = NULL,
  repository = NULL,
  iteration = NULL,
  file = NULL,
  seed = NULL
) {
  force(mode)
  force(command)
  force(depend)
  force(format)
  force(repository)
  force(iteration)
  force(file)
  force(seed)
  enclass(environment(), "tar_cue")
}

cue_new2 <- function(
  mode = NULL,
  command = NULL,
  depend = NULL,
  format = NULL,
  repository = NULL,
  iteration = NULL,
  file = NULL,
  seed = NULL
) {
  x <- list(
    mode = mode,
    command = command,
    depend = depend,
    format = format,
    repository = repository,
    iteration = iteration,
    file = file,
    seed = seed
  )
  enclass(list2env(x, parent = emptyenv()), "tar_cue")
}

cue_new3 <- function(
  mode = NULL,
  command = NULL,
  depend = NULL,
  format = NULL,
  repository = NULL,
  iteration = NULL,
  file = NULL,
  seed = NULL
) {
  out <- new.env(parent = emptyenv(), hash = FALSE)
  out$mode <- mode
  out$command <- command
  out$depend <- depend
  out$format <- format
  out$repository <- repository
  out$iteration <- iteration
  out$file <- file
  out$seed <- seed
  enclass(out, "tar_cue")
}

cue_new4 <- function(
  mode = NULL,
  command = NULL,
  depend = NULL,
  format = NULL,
  repository = NULL,
  iteration = NULL,
  file = NULL,
  seed = NULL
) {
  out <- new.env(parent = emptyenv(), hash = FALSE)
  assign(x = "mode", value = mode, envir = out)
  assign(x = "command", value = command, envir = out)
  assign(x = "depend", value = depend, envir = out)
  assign(x = "format", value = format, envir = out)
  assign(x = "repository", value = repository, envir = out)
  assign(x = "iteration", value = iteration, envir = out)
  assign(x = "file", value = file, envir = out)
  assign(x = "seed", value = seed, envir = out)
  enclass(out, "tar_cue")
}

obj_size(cue_new1("thorough", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE))
#> 2.21 kB
obj_size(cue_new2("thorough", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE))
#> 1.76 kB
obj_size(cue_new3("thorough", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE))
#> 1.76 kB
obj_size(cue_new4("thorough", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE))
#> 1.76 kB

microbenchmark::microbenchmark(
  cue1 = cue_new1("thorough", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE),
  cue2 = cue_new2("thorough", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE),
  cue3 = cue_new3("thorough", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE),
  cue4 = cue_new4("thorough", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE),
  times = 1e4,
  control = list(warmup = 100)
)
#> Unit: microseconds
#>  expr   min    lq     mean median    uq      max neval cld
#>  cue1 2.050 2.214 2.558818  2.255 2.378 1652.136 10000  a 
#>  cue2 2.460 2.624 2.927371  2.665 2.788  998.842 10000  a 
#>  cue3 1.599 1.763 2.104821  1.804 1.886 2024.170 10000  a 
#>  cue4 5.207 5.453 6.437340  5.576 5.781 3437.809 10000   b

wlandau Nov 1, 2024
Maintainer Author

As of a0f1326, by taking out promise objects with the cue_new3() approach, the local process of a large pipeline now consumes about 23% less memory.

philiporlando Nov 1, 2024

Thanks for implementing the cue_new3() approach! A 23% reduction is a solid improvement, especially with larger pipelines—I'm optimistic this will make a noticeable difference in my work.

Edit: Anecdotally, my real-world pipeline now appears to be consuming much less memory, whereas it was previously hitting a memory limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage in large pipelines #1352

{{title}}

Replies: 5 comments 12 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

High memory usage in large pipelines #1352

wlandau Oct 22, 2024 Maintainer

Replies: 5 comments · 12 replies

wlandau Oct 22, 2024 Maintainer Author

wlandau Oct 22, 2024 Maintainer Author

wlandau Oct 22, 2024 Maintainer Author

wlandau Oct 25, 2024 Maintainer Author

wlandau Oct 31, 2024 Maintainer Author

kyle-messier Nov 7, 2024

stemangiola Nov 8, 2024

kyle-messier Nov 8, 2024

wlandau Oct 25, 2024 Maintainer Author

wlandau Oct 25, 2024 Maintainer Author

wlandau Oct 31, 2024 Maintainer Author

wlandau Nov 1, 2024 Maintainer Author

wlandau Nov 1, 2024 Maintainer Author

philiporlando Nov 1, 2024

wlandau
Oct 22, 2024
Maintainer

Replies: 5 comments 12 replies

wlandau
Oct 22, 2024
Maintainer Author

wlandau
Oct 22, 2024
Maintainer Author

wlandau
Oct 22, 2024
Maintainer Author

wlandau Oct 25, 2024
Maintainer Author

wlandau Oct 31, 2024
Maintainer Author

wlandau
Oct 25, 2024
Maintainer Author

wlandau Oct 25, 2024
Maintainer Author

wlandau
Oct 31, 2024
Maintainer Author

wlandau Nov 1, 2024
Maintainer Author

wlandau Nov 1, 2024
Maintainer Author