Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale up on memory pressure #6826

Open
crusaderky opened this issue Aug 4, 2022 · 1 comment
Open

Scale up on memory pressure #6826

crusaderky opened this issue Aug 4, 2022 · 1 comment
Labels
adaptive All things relating to adaptive scaling discussion Discussing a topic with no specific actions yet memory

Comments

@crusaderky
Copy link
Collaborator

crusaderky commented Aug 4, 2022

As of today, an adaptive cluster will start new workers if the amount of queued tasks on the scheduler vastly exceeds the amount of workers that are already online; in other words if it believes that the expected "good" CPU time per new worker will be justified compared to the overhead of starting and stopping it.

There is however a second use case that would benefit from scaling up, and it is when the cluster memory is saturated. It would make sense, in this case, to fire up extra workers even if the pending CPU load does not justify it.

The definition of "cluster memory is saturated" is tricky though.

  • One option should be to simply count cluster-wide managed / (memory_limit * target). This would include spilled memory and be advantageous when there's a lot of spill/unspill activity, but inefficient when there's long-standing unperturbed spilled data.
  • A variant would be to use managed_in_memory exclusively, thus ignoring the spilled data. Workloads that thrash the spill file should still benefit as they won't start spilling until they reach target anyway.
  • Memory unbalances across workers may need to be considered. e.g. if you set a new threshold distributed.worker.memory.scale_up: 0.55, you may never reach it as a cluster-wide average if some workers are heavily saturated while others aren't; this should be solved by Rebalance during a moving cluster #4906.
  • Another, more sophisticated option would be to monitor the unspill (not spill) events - or if you prefer, cache misses - over the last few seconds and implement some heuristic on top of that (note that the scheduler currently does not get these events; it will after Do not rebalance spilled keys #6002)
  • Number of paused workers is also a thing that could be considered.

This ticket heavily interacts with #5999.

@crusaderky crusaderky added discussion Discussing a topic with no specific actions yet memory labels Aug 4, 2022
@fjetter
Copy link
Member

fjetter commented Aug 4, 2022

FYI the current algorithm already has a rough estimation for memory pressure implemented. It is not exclusively looking at CPU, see

limit = sum(ws.memory_limit for ws in self.workers.values())
used = sum(ws.nbytes for ws in self.workers.values())
memory = 0
if used > 0.6 * limit and limit > 0:
memory = 2 * len(self.workers)
target = max(memory, cpu)

Of course, this logic may need some adjustment. Haven't tested this myself, yet.

@fjetter fjetter added the adaptive All things relating to adaptive scaling label Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adaptive All things relating to adaptive scaling discussion Discussing a topic with no specific actions yet memory
Projects
None yet
Development

No branches or pull requests

2 participants