-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uwsgi + flask, got Respawned uWSGI worker, very frequently #3640
Comments
update: threads = 1 will be ok. but threads = 2, THE Ninth Request will be hang till the worker Respawned. |
same code, When I use thrift.JaegerExporter(replace OTLPSpanExporter(endpoint="http://localhost:6831", insecure=True)), everything is ok. |
What commands are you running to run the example? Is there a specific process runner you are using? |
@lzchen |
Try |
It's not a easy way to change base config for a mature business project😂 |
Is it harder than making whole otel-python[-contrib] fork-safe? |
😭you are right. I will try to research and drive it. |
FYI, ddtrace-py also require lazy-apps. |
@methane Oh, thanks for your addition information. |
lazy-apps is the most straightforward. |
@methane Yes, I saw that before. In my uwsgi + flask project, I use @postfork to test, It still not work, like I sad:"got Respawned uWSGI worker, very frequently" |
like? Did you reproduce it with exact the code? Unless there is a reproducible step, all efforts to investigate will be in vain. |
@methane yes, When I use |
maybe the key setter is '''max-requests''', When I remove it, It looks normal. |
@jpkrohling @methane @lzchen Could you have any ideas for that? Sorry for disturb your. |
the command is : |
OK. I got it.
opentelemetry-python/opentelemetry-sdk/src/opentelemetry/sdk/trace/export/__init__.py Line 189 in 8ad10f7
opentelemetry-python/opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Lines 1181 to 1182 in 8ad10f7
|
@methane Thank you very much for your careful analysis. This is unbelievable. |
@methane Oh,sorry to mention, When I use JaegerExporter, I also use BatchSpanProcessor, so I think the key thing is OTLPExporter.shutdown or export? |
I can reproduce it with http.OTLPSpanExporter. grpc exporter shutdown is not related because it doesn't use daemon thread. |
@methane I found some message on grpc rep, see-->grpc/grpc#23796 (comment) Perhaps a solution would be to add some documentation? |
It is separeted issue. It is fork-safe problem. And |
I made a pull request to uwsgi that fixes no atexit call issue when mixing |
@methane Thank you so much. Hope uwsgi could merge the request quickly.😂 |
UPDATE: This is not caused by BatchSpanProcessor. Python doesn't wait daemon threads at all. I'm sorry about it. This is caused by uwsgi. uwsgi uses pthread_cancel() to stop worker even for graceful shutdown. It is really bad idea. pthread_cancel() is almost impossible to use safely. I think both of uwsgi and Python are cancel-safe. For the record, I can reproduce it easily even I remove all opentelemetry imports. |
@methane Thank you for your comment. for example:
|
When threads=1, there is only main thread. When main thread reached max-requests, main thread tries to stop other thread, but no other thread. pthread_cancel() is not called so no problem. I guess the hungup is happens when worker thread excpet main thread (core_id=0) reaches max-requests and calls pthread_cancel() to main thread. But I am not sure. Threads is hard. But pthread_cancel() is too hard. Sometime, I can not even attaching broken process. For fixing it, please try unbit/uwsgi#2615 and unbit/uwsgi#2619. Anyway, I can reproduce it without any opentelemetry imports. So this is not an issue of otel. |
Thank you for your additional information. I will close this issue. Once again, I appreciate your help for this issue, and I wish you thread-safety forever. Haha😝 |
worker stops when reached max_requests or reload_on_*. https://github.com/unbit/uwsgi/blob/39f3ade88c88693f643e70ecf6c36f9b375f00a2/core/utils.c#L1216-L1251 `goodbye_cruel_world()` is not graceful. It caused `atexit` not called. If atexit stops daemon threads, worker won't stop until killed from master. Using a reproducer similar to tests/threads_atexit.py: *** uWSGI is running in multiple interpreter mode *** spawned uWSGI master process (pid: 93920) spawned uWSGI worker 1 (pid: 93921, cores: 80) ...The work of process 93921 is done (max requests reached (641 >= 20)). Seeya! worker 1 killed successfully (pid: 93921) Respawned uWSGI worker 1 (new pid: 94019) ...The work of process 94019 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94019) Respawned uWSGI worker 1 (new pid: 94099) ...The work of process 94099 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94099) Respawned uWSGI worker 1 (new pid: 94179) ...The work of process 94179 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94179) Respawned uWSGI worker 1 (new pid: 94260) ...The work of process 94260 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94260) Respawned uWSGI worker 1 (new pid: 94340) atexit is not called. *** uWSGI is running in multiple interpreter mode *** spawned uWSGI master process (pid: 94781) spawned uWSGI worker 1 (pid: 94782, cores: 80) ...The work of process 94782 is done (max requests reached (402 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94782) Respawned uWSGI worker 1 (new pid: 94880) ...The work of process 94880 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94880) Respawned uWSGI worker 1 (new pid: 94960) ...The work of process 94960 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94960) Respawned uWSGI worker 1 (new pid: 95040) ...The work of process 95040 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 95040) Respawned uWSGI worker 1 (new pid: 95120) ...The work of process 95120 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 95120) Respawned uWSGI worker 1 (new pid: 95200) atexit is called Related issue: open-telemetry/opentelemetry-python#3640
worker stops when reached max_requests or reload_on_*. https://github.com/unbit/uwsgi/blob/39f3ade88c88693f643e70ecf6c36f9b375f00a2/core/utils.c#L1216-L1251 `goodbye_cruel_world()` is not graceful. It caused `atexit` not called. If atexit stops daemon threads, worker won't stop until killed from master. Using a reproducer similar to tests/threads_atexit.py: *** uWSGI is running in multiple interpreter mode *** spawned uWSGI master process (pid: 93920) spawned uWSGI worker 1 (pid: 93921, cores: 80) ...The work of process 93921 is done (max requests reached (641 >= 20)). Seeya! worker 1 killed successfully (pid: 93921) Respawned uWSGI worker 1 (new pid: 94019) ...The work of process 94019 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94019) Respawned uWSGI worker 1 (new pid: 94099) ...The work of process 94099 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94099) Respawned uWSGI worker 1 (new pid: 94179) ...The work of process 94179 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94179) Respawned uWSGI worker 1 (new pid: 94260) ...The work of process 94260 is done (max requests reached (721 >= 20)). Seeya! worker 1 killed successfully (pid: 94260) Respawned uWSGI worker 1 (new pid: 94340) atexit is not called. *** uWSGI is running in multiple interpreter mode *** spawned uWSGI master process (pid: 94781) spawned uWSGI worker 1 (pid: 94782, cores: 80) ...The work of process 94782 is done (max requests reached (402 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94782) Respawned uWSGI worker 1 (new pid: 94880) ...The work of process 94880 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94880) Respawned uWSGI worker 1 (new pid: 94960) ...The work of process 94960 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 94960) Respawned uWSGI worker 1 (new pid: 95040) ...The work of process 95040 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 95040) Respawned uWSGI worker 1 (new pid: 95120) ...The work of process 95120 is done (max requests reached (721 >= 20)). Seeya! on_exit: uwsgi.worker_id()=1 worker 1 killed successfully (pid: 95120) Respawned uWSGI worker 1 (new pid: 95200) atexit is called Related issue: open-telemetry/opentelemetry-python#3640
Describe your environment Describe any aspect of your environment relevant to the problem, including your Python version, platform, version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main.
Steps to reproduce
I use the code like:https://github.com/open-telemetry/opentelemetry-python/blob/main/docs/examples/fork-process-model/flask-uwsgi/app.py
What is the expected behavior?
report span rightly
What is the actual behavior?
logs like:
very very frequently. This can't be normal. But uwsgi provided no more logs.
and config like this:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: