uwsgi + flask, got Respawned uWSGI worker, very frequently #3640

Liubey · 2024-01-22T09:29:22Z

Describe your environment Describe any aspect of your environment relevant to the problem, including your Python version, platform, version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main.

Steps to reproduce
I use the code like:https://github.com/open-telemetry/opentelemetry-python/blob/main/docs/examples/fork-process-model/flask-uwsgi/app.py

What is the expected behavior?
report span rightly
What is the actual behavior?
logs like:

..The work of process 534 is done. Seeya!
Mon Jan 22 17:17:43 2024 - worker 4 (pid: 449) is taking too much time to die...NO MERCY !!!
worker 4 killed successfully (pid: 449)
Respawned uWSGI worker 4 (new pid: 568)

very very frequently. This can't be normal. But uwsgi provided no more logs.
and config like this:

master = true
enable-threads = True
harakiri-graceful-timeout = 5
threads = 2
processes = 4

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

Liubey · 2024-01-22T09:58:19Z

update: threads = 1 will be ok. but threads = 2, THE Ninth Request will be hang till the worker Respawned.

Liubey · 2024-01-24T08:39:54Z

same code, When I use thrift.JaegerExporter(replace OTLPSpanExporter(endpoint="http://localhost:6831", insecure=True)), everything is ok.
It’s wired.

lzchen · 2024-01-25T18:07:47Z

@Liubey

What commands are you running to run the example? Is there a specific process runner you are using?

Liubey · 2024-01-26T01:48:32Z

@lzchen
uwsgi --http :8000 --wsgi-file app.py --callable application --master --enable-threads --threads 2 --processes 4
It’s wired cause replace OTLPSpanExporter to JaegerExporter is ok. I don't know why OTLPSpanExporter does't work fine at uwsgi mutithread.

methane · 2024-02-15T11:59:51Z

Try --lazy-apps

Liubey · 2024-03-04T02:19:37Z

It's not a easy way to change base config for a mature business project😂

methane · 2024-03-04T05:02:06Z

Is it harder than making whole otel-python[-contrib] fork-safe?

Liubey · 2024-03-06T07:53:21Z

😭you are right. I will try to research and drive it.

methane · 2024-03-06T08:05:33Z

FYI, ddtrace-py also require lazy-apps.

https://github.com/DataDog/dd-trace-py/blob/8b11b36d29ee4cda1a6000d10fd74208bbb5f39d/ddtrace/internal/uwsgi.py#L35

Liubey · 2024-03-06T08:25:10Z

@methane Oh, thanks for your addition information.
But strange thing is, I have two project, One use django + uwsgi + OTLPSpanExporter, it's ok and not set lazy-apps.

methane · 2024-03-06T08:29:18Z

lazy-apps is the most straightforward.
But there are some options to avoid the fork after loading app.

methane · 2024-03-06T08:51:52Z

@Liubey Do you read and follow this?

https://github.com/open-telemetry/opentelemetry-python/tree/main/docs/examples/fork-process-model#working-with-fork-process-models

Liubey · 2024-03-07T02:10:19Z

@methane Yes, I saw that before. In my uwsgi + flask project, I use @postfork to test, It still not work, like I sad:"got Respawned uWSGI worker, very frequently"

methane · 2024-03-07T03:26:58Z

I use the code like:https://github.com/open-telemetry/opentelemetry-python/blob/main/docs/examples/fork-process-model/flask-uwsgi/app.py

like? Did you reproduce it with exact the code?

Unless there is a reproducible step, all efforts to investigate will be in vain.

Liubey · 2024-03-07T08:27:01Z

@methane yes, When I use

I got same thing when I wait a few minutes after server startup for receive requests.
the highlight is max-requests and thread set.

Liubey · 2024-03-07T08:31:02Z

maybe the key setter is '''max-requests''', When I remove it, It looks normal.

Liubey · 2024-03-07T09:49:39Z

When I set '''max-requests=50''' AND '''processes=4''', the 201~204 request will hang for 50 seconds+

and got "Respawned uWSGI worker " for the 4 processor

Liubey · 2024-03-07T09:54:29Z

@jpkrohling @methane @lzchen Could you have any ideas for that? Sorry for disturb your.

Liubey · 2024-03-07T09:58:38Z

the command is :
uwsgi --http :8000 --wsgi-file app.py --callable application --master --enable-threads --threads 2 --processes 4 --max-requests 50

methane · 2024-03-07T13:24:10Z

OK. I got it.

BatchSpanProcessor uses daemon thread. So BatchSpanProcessor.shutdown() is needed to stop it.

opentelemetry-python/opentelemetry-sdk/src/opentelemetry/sdk/trace/export/__init__.py

Line 189 in 8ad10f7

self.worker_thread = threading.Thread(

TracerProvider uses atexit to call shutdown(). And its shutdown calls BatchSpanProcessor.shutdown().

opentelemetry-python/opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py

Lines 1181 to 1182 in 8ad10f7

    
           if shutdown_on_exit: 
        
               self._atexit_handler = atexit.register(self.shutdown)

But uwsgi may not invoke atexit. So deamon thread is not stopped and Python process keep running.

unbit/uwsgi#2063

Liubey · 2024-03-08T01:34:18Z

@methane Thank you very much for your careful analysis. This is unbelievable.
Is there an easy solution to bypass the bug, patch the OtelBatchSpanProcessor thread to deamon=False?

Liubey · 2024-03-08T02:12:39Z

@methane Oh,sorry to mention, When I use JaegerExporter, I also use BatchSpanProcessor, so I think the key thing is OTLPExporter.shutdown or export?

Liubey · 2024-03-08T03:05:05Z

test with http.OTLPSpanExporter, looks normal, not hanging, just

methane · 2024-03-08T04:28:02Z

I can reproduce it with http.OTLPSpanExporter.
I don't test JaegerExporter yet.

grpc exporter shutdown is not related because it doesn't use daemon thread.

Liubey · 2024-03-08T05:21:01Z

@methane
maybe the key thing is grpc, cause I use thrift.JaegerExporter, it's ok. and grpc.JaegerExporter is not ok.

I found some message on grpc rep, see-->grpc/grpc#23796 (comment)
He said :
numbsafari commented on Feb 2, 2022
Hi, 2022 checking in. It appears that gRPC doesn’t work with any kind of pre-fork (eg uWSGI) model system unless you specify the environment variables noted above.

Perhaps a solution would be to add some documentation?

methane · 2024-03-08T07:48:41Z

It is separeted issue. It is fork-safe problem. And --lazy-apps (recommended) or @post_fork can avoid it.

methane · 2024-03-10T10:35:41Z

I made a pull request to uwsgi that fixes no atexit call issue when mixing threads and max-requests.

Liubey · 2024-03-12T05:53:28Z

@methane Thank you so much. Hope uwsgi could merge the request quickly.😂

methane · 2024-03-19T09:28:59Z

UPDATE: This is not caused by BatchSpanProcessor. Python doesn't wait daemon threads at all. I'm sorry about it.

This is caused by uwsgi. uwsgi uses pthread_cancel() to stop worker even for graceful shutdown. It is really bad idea. pthread_cancel() is almost impossible to use safely. I think both of uwsgi and Python are cancel-safe.

For the record, I can reproduce it easily even I remove all opentelemetry imports.

Liubey · 2024-03-21T02:33:03Z

@methane Thank you for your comment.
But what I want to know is that if the thread is set to 1, it doesn't seem to happen this case, even max-requests set.

for example:

 uwsgi --http :8000 --wsgi-file app.py --callable application --master --enable-threads --threads 1 --processes 4 --max-requests 50

stop the worker very fast.

methane · 2024-03-21T03:09:07Z

When threads=1, there is only main thread. When main thread reached max-requests, main thread tries to stop other thread, but no other thread. pthread_cancel() is not called so no problem.

I guess the hungup is happens when worker thread excpet main thread (core_id=0) reaches max-requests and calls pthread_cancel() to main thread. But I am not sure. Threads is hard. But pthread_cancel() is too hard. Sometime, I can not even attaching broken process.

For fixing it, please try unbit/uwsgi#2615 and unbit/uwsgi#2619.

Anyway, I can reproduce it without any opentelemetry imports. So this is not an issue of otel.
I think this issue should be closed.

Liubey · 2024-03-21T03:15:13Z

Thank you for your additional information. I will close this issue. Once again, I appreciate your help for this issue, and I wish you thread-safety forever. Haha😝

worker stops when reached max_requests or reload_on_*. https://github.com/unbit/uwsgi/blob/39f3ade88c88693f643e70ecf6c36f9b375f00a2/core/utils.c#L1216-L1251 `goodbye_cruel_world()` is not graceful. It caused `atexit` not called. If atexit stops daemon threads, worker won't stop until killed from master. Using a reproducer similar to tests/threads_atexit.py: *** uWSGI is running in multiple interpreter mode *** spawned uWSGI master process (pid: 93920) spawned uWSGI worker 1 (pid: 93921, cores: 80) ...The work of process 93921 is done (max requests reached (641 >= 20)). Seeya! worker 1 killed successfully (pid: 93921) Respawned uWSGI worker 1 (new pid: 94019) ...The work of process 94019 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94019) Respawned uWSGI worker 1 (new pid: 94099) ...The work of process 94099 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94099) Respawned uWSGI worker 1 (new pid: 94179) ...The work of process 94179 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94179) Respawned uWSGI worker 1 (new pid: 94260) ...The work of process 94260 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94260) Respawned uWSGI worker 1 (new pid: 94340) atexit is not called. *** uWSGI is running in multiple interpreter mode *** spawned uWSGI master process (pid: 94781) spawned uWSGI worker 1 (pid: 94782, cores: 80) ...The work of process 94782 is done (max requests reached (402 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94782) Respawned uWSGI worker 1 (new pid: 94880) ...The work of process 94880 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94880) Respawned uWSGI worker 1 (new pid: 94960) ...The work of process 94960 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94960) Respawned uWSGI worker 1 (new pid: 95040) ...The work of process 95040 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 95040) Respawned uWSGI worker 1 (new pid: 95120) ...The work of process 95120 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 95120) Respawned uWSGI worker 1 (new pid: 95200) atexit is called Related issue: open-telemetry/opentelemetry-python#3640

Liubey added the bug Something isn't working label Jan 22, 2024

Liubey mentioned this issue Mar 8, 2024

Implement shutdown procedure for OTLP grpc exporters #3138

Merged

11 tasks

methane mentioned this issue Mar 8, 2024

graceful stop worker when max_requests/realod_on_* unbit/uwsgi#2615

Closed

Liubey closed this as completed Mar 21, 2024

xrmx mentioned this issue Apr 6, 2024

core/uwsgi: graceful stop worker when max_requests/reload_on_* unbit/uwsgi#2626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uwsgi + flask, got Respawned uWSGI worker, very frequently #3640

uwsgi + flask, got Respawned uWSGI worker, very frequently #3640

Liubey commented Jan 22, 2024

Liubey commented Jan 22, 2024

Liubey commented Jan 24, 2024 •

edited

Loading

lzchen commented Jan 25, 2024

Liubey commented Jan 26, 2024 •

edited

Loading

methane commented Feb 15, 2024

Liubey commented Mar 4, 2024

methane commented Mar 4, 2024

Liubey commented Mar 6, 2024

methane commented Mar 6, 2024

Liubey commented Mar 6, 2024

methane commented Mar 6, 2024

methane commented Mar 6, 2024

Liubey commented Mar 7, 2024

methane commented Mar 7, 2024

Liubey commented Mar 7, 2024

Liubey commented Mar 7, 2024

Liubey commented Mar 7, 2024 •

edited

Loading

Liubey commented Mar 7, 2024 •

edited

Loading

Liubey commented Mar 7, 2024

methane commented Mar 7, 2024

Liubey commented Mar 8, 2024

Liubey commented Mar 8, 2024

Liubey commented Mar 8, 2024

methane commented Mar 8, 2024

Liubey commented Mar 8, 2024

methane commented Mar 8, 2024

methane commented Mar 10, 2024

Liubey commented Mar 12, 2024

methane commented Mar 19, 2024 •

edited

Loading

Liubey commented Mar 21, 2024

methane commented Mar 21, 2024

Liubey commented Mar 21, 2024

uwsgi + flask, got Respawned uWSGI worker, very frequently #3640

uwsgi + flask, got Respawned uWSGI worker, very frequently #3640

Comments

Liubey commented Jan 22, 2024

Liubey commented Jan 22, 2024

Liubey commented Jan 24, 2024 • edited Loading

lzchen commented Jan 25, 2024

Liubey commented Jan 26, 2024 • edited Loading

methane commented Feb 15, 2024

Liubey commented Mar 4, 2024

methane commented Mar 4, 2024

Liubey commented Mar 6, 2024

methane commented Mar 6, 2024

Liubey commented Mar 6, 2024

methane commented Mar 6, 2024

methane commented Mar 6, 2024

Liubey commented Mar 7, 2024

methane commented Mar 7, 2024

Liubey commented Mar 7, 2024

Liubey commented Mar 7, 2024

Liubey commented Mar 7, 2024 • edited Loading

Liubey commented Mar 7, 2024 • edited Loading

Liubey commented Mar 7, 2024

methane commented Mar 7, 2024

Liubey commented Mar 8, 2024

Liubey commented Mar 8, 2024

Liubey commented Mar 8, 2024

methane commented Mar 8, 2024

Liubey commented Mar 8, 2024

methane commented Mar 8, 2024

methane commented Mar 10, 2024

Liubey commented Mar 12, 2024

methane commented Mar 19, 2024 • edited Loading

Liubey commented Mar 21, 2024

methane commented Mar 21, 2024

Liubey commented Mar 21, 2024

Liubey commented Jan 24, 2024 •

edited

Loading

Liubey commented Jan 26, 2024 •

edited

Loading

Liubey commented Mar 7, 2024 •

edited

Loading

Liubey commented Mar 7, 2024 •

edited

Loading

methane commented Mar 19, 2024 •

edited

Loading