limit task pool size #987

hjoliver · 2014-06-19T10:58:21Z

Currently (on the iso8601 branch) the runahead limit is a suite-wide parameter that acts as follows: each task proxy spawns a 'waiting' successor when it enters the 'submitted' state. If a waiting task is beyond the runahead limit, it goes into a special pool that does not participate in dependency matching. Tasks under the limit drop back into the main pool. So at least one instance of every defined task proxy exists at any one time, more if one or more instances are submitted or running at the time. Only those below the runahead limit will impact scheduling performance (but all affect cylc's memory footprint). Note that tasks do not spawn multiple waiting instances out to the runahead limit.

@matthewrmshin has some ideas on how to avoid a suite-wide runahead limit...

[edit: it seems the focus of this issue is really on limiting the task pool size, which is one function of the runahead limit but not the only one - we may still need it as well (see below).]

matthewrmshin · 2014-06-19T11:19:05Z

Moved this from the re-factor proposal:

Run ahead limit is used to address 2 main problems:

We don't want to flood the disk. (Assuming that most tasks generate some data.)

Currently, run ahead addresses the question of when it is safe to start install tasks (or any tasks that eat up lots disk space) for a cycle.

This should be a suite design issue.

If we have tasks that eat up lots of disk space, then they should always be paired with housekeep tasks.

A dependency by an install task on a housekeep task that clears data generated from a historical cycle would be sufficient to address the problem?

We don't want to flood the task pool.

Consider a suite's task pool. If it becomes inefficient at around 2000 task proxies (more in the future, but a limit will still exist), then a run ahead limit based on cycle time or number of cycles does not make sense. (E.g. a 300 task-per-cycle suite can happily run 6 cycles ahead, but a 1500 task-per-cycle suite can only run a cycle ahead.) Instead, we should move to an architecture that allows us to set the maximum size of the task pool.

hjoliver · 2014-06-20T11:44:48Z

As discussed, limiting the task pool size will improve performance for much larger suites, and in principle it would allow us to drop the runahead limit. But in my opinion we will still need a runahead limit as well as limiting the pool size. Otherwise even small suites - if they contain quick-running data-retrieval tasks that are not constrained by clock triggers, or similar - could fill the task pool. This may not be a real problem unless the run-ahead tasks fill up the disk, but it will make the suite hard to monitor (graph view) and it may cause panic in users (OMG - my get_data task just ran out 100 years ahead!). If this is not considered a problem the user can just set the runahead limit very high.

hjoliver · 2014-06-25T21:35:04Z

Issue title changed from "rethink the runahead limit" to "limit task pool size".

As per my previous comment, the runahead limit, although related, is a different concept and will probably still be needed in addition to limiting the pool size.

hjoliver · 2014-06-25T21:46:54Z

Limiting the pool size would require knowing which waiting tasks will be needed first, as excluding the wrong ones from dependency matching could stall the suite? See also #993. How to do this? I guess we'll need to use dependency information from the suite graph: if a task has achieved the 'submitted' state or beyond, its downstream dependants could be needed soon and so should be created (#993) and added to the task pool. (related: this could also be used to ditch our long-standing free-for-all dependency matching - each waiting task really only needs to check the outputs of the few upstream tasks that the graph says it depends on - #108 (comment)) [done]. This would greatly reduce the size of the task pool as currently conceived (which contains waiting instances of almost all tasks), but we'll still need an absolute size limit as well, e.g. for suites with massive ensembles in which all members become ready to run at once. Maybe a FIFO queue or similar to manage adding tasks that are "needed next" to the dependency-matching pool, to avoid stalling parts of the suite by continually excluding those tasks from the pool. Also: retention of succeeded tasks in the pool (and the cleanup algorithm to remove them) could be dispensed with if we just keep track of completed outputs instead [#1902].

hjoliver · 2016-06-22T10:01:32Z

[meeting]

agreed we should move to spawn immediate graph descendants of active tasks
cylc-8 ?
we'll need to be able to operate on tasks that are not in the pool yet or have left it (could have minimal "ghost" objects that can be promoted to real task proxies when needed)

benfitzpatrick · 2016-06-22T10:52:59Z

When we change the spawning behaviour (see the superseded issue #1538) so that there is not an implicit task successors submit dependency, we need to be able to report possible changes in behaviour in cylc validate

hjoliver added this to the later milestone Jun 19, 2014

hjoliver changed the title ~~rethink the runahead limit~~ limit task pool size Jun 25, 2014

hjoliver mentioned this issue Jun 25, 2014

task spawn on demand #993

Closed

hjoliver mentioned this issue Jun 22, 2016

Memory use in large suites #1222

Closed

This was referenced Jun 22, 2016

spent task cleanup can be broken by some cylc-6 graphs #1392

Closed

Allowing cycling tasks to run out of order. #1538

Closed

hjoliver added the efficiency For notable efficiency improvements label Jun 23, 2016

hjoliver mentioned this issue Feb 5, 2017

Users should not need to know about insertion (or task proxies generally) #2143

Closed

matthewrmshin mentioned this issue Nov 24, 2017

Sub-cycling? #2241

Open

oliver-sanders mentioned this issue Jul 27, 2018

reload and restart - smart insertion of new tasks #774

Closed

oliver-sanders mentioned this issue Aug 21, 2019

spawn on demand / event driven graph #3304

Open

matthewrmshin modified the milestones: later, cylc-9 Aug 28, 2019

oliver-sanders mentioned this issue Jul 15, 2020

Task proxy spawn on demand. #3515

Merged

11 tasks

hjoliver closed this as completed in #3515 Jul 29, 2020

dpmatthews modified the milestones: cylc-9, cylc-8.0a3 Jul 29, 2020

hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit task pool size #987

limit task pool size #987

hjoliver commented Jun 19, 2014

matthewrmshin commented Jun 19, 2014

hjoliver commented Jun 20, 2014

hjoliver commented Jun 25, 2014

hjoliver commented Jun 25, 2014 •

edited

Loading

hjoliver commented Jun 22, 2016 •

edited

Loading

benfitzpatrick commented Jun 22, 2016

limit task pool size #987

limit task pool size #987

Comments

hjoliver commented Jun 19, 2014

matthewrmshin commented Jun 19, 2014

hjoliver commented Jun 20, 2014

hjoliver commented Jun 25, 2014

hjoliver commented Jun 25, 2014 • edited Loading

hjoliver commented Jun 22, 2016 • edited Loading

benfitzpatrick commented Jun 22, 2016

hjoliver commented Jun 25, 2014 •

edited

Loading

hjoliver commented Jun 22, 2016 •

edited

Loading