Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Memory issues for framework watcher and poller in DBController #4850

Open
hzy46 opened this issue Aug 27, 2020 · 1 comment
Open

Memory issues for framework watcher and poller in DBController #4850

hzy46 opened this issue Aug 27, 2020 · 1 comment
Assignees
Labels

Comments

@hzy46
Copy link
Contributor

hzy46 commented Aug 27, 2020

If there is a lot of waiting jobs (e.g. 30000+ frameworks), and framework watcher does a re-listing, the memory usage will soon rise up to >1000 MB+.

@yiyione yiyione mentioned this issue Aug 27, 2020
30 tasks
@hzy46
Copy link
Contributor Author

hzy46 commented Aug 28, 2020

I have made an investigation of this issue. Currently, memory issues exist both for framework watcher and poller.

For framework watcher, if it is restarted or the watch connection is disconnected, a re-listing will happen. If there are too many frameworks, memory issue happens.

For poller, in every polling round, if there are too many deleted=false and (completed=true or synced=false) frameworks in database, memory issue happens.

I have tested the memory usage of basic framework objects in Node.js: 10000 simple framework objects can consume 265MB+ memory. So there is no surprise that 30000 real framework objects can consume 1GB+ memory.

Apart from re-listing/polling, if there are too many frameworks in processing queue, the service will be restarted. Then the re-listing/polling happens. So solving memory problem in re-listing/polling is more critical.

To solve the memory problem in framework watcher, we can take advantage of chunk listing in k8s api server. Each time for re-listing, we query api server by chunks and synchronize it to write-merger. Then start watching from the chunks' resource version. The informer in k8s node.js client doesn't support this (see here for the logic of it). So we should handle it by ourselves.

To solve the memory problem in poller, we may use SELECT ... WHERE ... ORDER BY "submissionTime" ASC LIMIT N in every polling round. But we should confirm every job can be polled eventually.

To mitigate this issue during processing in queue, we can:

  • raise memory limit
  • add a concurrency control to handle burst

@hzy46 hzy46 changed the title Framework watcher consumes too much memory when re-listing frameworks Memory issues for framework watcher and poller in DBController Aug 28, 2020
@scarlett2018 scarlett2018 mentioned this issue Sep 7, 2020
5 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants