-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster panic #828
Comments
@alexanderfefelov But on the other hand, there may be a timing issue with the executor. If so, I would expect this to occur with jobs that are very short lived. Do you have such jobs in your cluster? |
Maybe you're right.
I'll try it out.
Yes, all my jobs are short-lived now. |
I have encountered the same error running on a CentOS based system with kernel 4.14. |
@alexanderfefelov @BlackDex , |
We are observing very similar issue with our cluster of 3 servers. We are using
Observed panic seems to originate from here Line 65 in 19c0982
as a result our cluster stops to execute jobs, despite that this panic happens to only one of the nodes. the error in logs looks like:
Is there maybe some known configuration setting that could prevent cluster to go down in such scenario? Or any timeline when #835 could be merged and released? |
🙏 if someone can give the fix in the PR a try |
Maybe it would help if we provided a package or binary for testing purposes? |
Agent crash on 3.0.8:
|
I have the exact same issue as well.
The expectation was to see 100 rps sustained throughput on the Note: sometimes they crash exactly when I try to open the
|
I released a preproduction build of Dkron v3.1.4 with patch #835 built in. Hopefully this will make it easier for someone to test the PR (#835). |
We faced this issue too, the PR is already merged, @Victorcoder when this will be released/tagged? |
@ncsibra this is difficult to test, did you test if it works properly with the pre-release version? |
I assumed it was tested because it's already merged. |
@yvanoers You're right, sorry, I mixed them up somehow. |
@yvanoers I tested your branch with the f31c7f5f32e30424a7868922a61e9198da5c74ce commit. |
This are really good news @ncsibra, thanks for testing, I'm going to merge and include the fix in the next release. |
Some time after start, all nodes of my cluster (two servers and two agents) crash with the same error:
Environment
Dkron 3.0.5
Docker (servers: Dockerfile, run script, agents: Dockerfile, run script)
uname -a
:The text was updated successfully, but these errors were encountered: