Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker: Set RMQ consumer_timeout to undefined #6189

Merged
merged 7 commits into from
Nov 22, 2023

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Nov 21, 2023

@unkcpz unkcpz marked this pull request as draft November 21, 2023 14:56
Comment on lines 17 to 18
# 1000 hours in milliseconds
echo "consumer_timeout=3600000000" >> "${RMQ_ETC_DIR}/rabbitmq.conf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it perhaps be better to add the advanced.confgi approach instead, which allows us to simply set the timeout to undefined? I think the following should work

Suggested change
# 1000 hours in milliseconds
echo "consumer_timeout=3600000000" >> "${RMQ_ETC_DIR}/rabbitmq.conf"
cat > "${RMQ_ETC_DIR}/advanced.config" <<EOF
%% advanced.config
[
{rabbit, [
{consumer_timeout, undefined}
]}
].
EOF

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool, I didn't know this. Will give it a build locally and test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed and add test checking it is set, but better to run a calculation to test it really activated.

Copy link

codecov bot commented Nov 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fae2a9c) 80.73% compared to head (01f9036) 80.73%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6189      +/-   ##
==========================================
+ Coverage   80.73%   80.73%   +0.01%     
==========================================
  Files         548      548              
  Lines       40074    40074              
==========================================
+ Hits        32349    32350       +1     
+ Misses       7725     7724       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@unkcpz
Copy link
Member Author

unkcpz commented Nov 21, 2023

Hi @sphuber, in discourse you mentioned you can reproduce the issue correct? Can you try with ghcr.io/aiidateam/aiida-core-with-services:sha-558be45 from this PR and ghcr.io/aiidateam/aiida-core-with-services:edge to make sure this solves the problem?

@unkcpz unkcpz force-pushed the fix/xx/stack-image-rabbitmq-consumer-timeout-increase branch from afa7b3a to fdcc124 Compare November 21, 2023 15:52
@unkcpz unkcpz changed the title Docker: Increase RMQ consumer_timeout from 1h to 1000h Docker: Set RMQ consumer_timeout to undefined Nov 21, 2023
@unkcpz unkcpz marked this pull request as ready for review November 21, 2023 15:54
@sphuber
Copy link
Contributor

sphuber commented Nov 21, 2023

@unkcpz I baked this recipe locally so that I can run a container and try and debug why the build on Github actions was failing. I managed to build and start a container, but only as the normal user. When I try to -u root the container hangs during startup because it cannot start the services as root for some reason. Is there a way to become root inside the container though? I want to check rabbitmqctl environment to make sure the consumer_timeout is properly set to undefined. Note that verdi status actually runs fine in my locally baked image. Don't understand why it fails on GHA.

@unkcpz
Copy link
Member Author

unkcpz commented Nov 21, 2023

Yes, I found the same that -u root will stuck the initial processes. To root, running a container first and run docker exec -it -u 0 <container id> bash to get into the running container as root.

@unkcpz
Copy link
Member Author

unkcpz commented Nov 21, 2023

The failed test is on arm64 build, I can test it from my mackbook.

@unkcpz
Copy link
Member Author

unkcpz commented Nov 21, 2023

I tested locally on my M1 Mac, and all passed. The failed test is the timeout of _docker_service_wait, I roll back to the old time.sleep method to make sure the profile is ready before the test. The downside is the test takes a bit longer. If this makes the test pass, I will not combine the conftest polish changes in this PR.

@unkcpz unkcpz force-pushed the fix/xx/stack-image-rabbitmq-consumer-timeout-increase branch from 824f9b1 to 01f9036 Compare November 21, 2023 22:55
@unkcpz unkcpz requested a review from sphuber November 21, 2023 23:08
@sphuber
Copy link
Contributor

sphuber commented Nov 22, 2023

Yes, I found the same that -u root will stuck the initial processes. To root, running a container first and run docker exec -it -u 0 <container id> bash to get into the running container as root.

Excellent, that worked, thanks. I verified that rabbitmqctl environment returns

      {consumer_timeout,undefined},

so I am pretty sure it worked.

@sphuber sphuber merged commit 33dffb0 into main Nov 22, 2023
34 checks passed
@sphuber sphuber deleted the fix/xx/stack-image-rabbitmq-consumer-timeout-increase branch November 22, 2023 07:36
sphuber pushed a commit that referenced this pull request Nov 30, 2023
As of RabbitMQ v3.8.15, a default `consumer_timeout` is set of 30 minutes.
If a task is not acknowledged within this timelimit, the consumer of the
task is considered dead and its tasks are rescheduled. This is problematic
for AiiDA since tasks often take multiple hours even.

The `consumer_timeout` can only be changed on through the server config.
Here we disable it through the `advanced.config`.

Cherry-pick: 33dffb0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants