Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mcweb backend resource consumption #770

Open
philbudne opened this issue Sep 3, 2024 · 0 comments
Open

mcweb backend resource consumption #770

philbudne opened this issue Sep 3, 2024 · 0 comments

Comments

@philbudne
Copy link
Contributor

When I look at memory resources in grafana, I often see that RAM and swap space are .... "well utilized", to the point of there not being much headroom:
image

If the system were to run out of free memory and swap space, the likely result would be the kernel killing some process (the out of memory (OOM) killer), which is rarely happiness inducing, especially since mcweb shares a server with other apps.

In short, this is a problem waiting to happen when something new is installed on tarbell, or the workload changes.

Looking at running mcweb processes and threads, I see 16 "web" containers, 16 gunicorn processes with one thread, and 1024 processes with either 38 or 39 threads, for a current total of 39095 mcweb process threads.

I have observed that the memory utilization drops when mcweb is restarted, so it's reasonable to think that the memory is being used by mcweb.

I can easily believe that the current configuration was reached by frobbing knobs when there were problems until the problems went away, and since no one wants to INDUCE problems, there hasn't been any backtracking to determine a set of settings that are necessary AND sufficient.

It strains my credulity to believe that almost 40,000 threads are necessary to serve web and API users.

The default stack size appears to be 8MiB, so the stacks for 40,000 threads could mean 312GiB of virtual memory (idle threads aren't "free").

If there is interest in trying to address this BEFORE it's a barn fire, I would suggest trying to simulate a realistic workload of web-search and API users at a development mcweb and seeing what problems arise, and how to solve them, trying to minimize the number of strands of spaghetti left stuck to the wall! HOWEVER, the tests should be short, and run at off hours so that researchers (and indexer stacks) are not effected by any effects to ES response time. BUT finding out what kind of load ES can sustain would be useful data for future upgrades to the ES configuration.

The parameter I'm most suspicious of being too large is WEB_CONCURRENCY (also set by --workers) which is set to 64 in prod/staging. The gunicorn default is one.

gunicorn docs/source/design.rst says:

DO NOT scale the number of workers to the number of clients you expect to have. Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.

Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request.

16 containers, each with 64 worker processes would account for the 1024 processes with more than one thread. The reason each has either 38 or 39 threads has eluded me. It might be related to the number of CPU cores (32 on tarbell). On ifill, with 24 cores, and 5 web containers I see 5 processes with 24 threads each.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant