-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receptor Work unit expired #890
Comments
Ensure the time on AWX and Receptor node are in sync. |
@Iyappanj |
@kurokobo Yes, the time is in Sync. but still sometimes I could see this issue and then it will resolve by itself for few nodes. |
Ah sorry I misread the message as an error about the token expiration. |
Hi everyone, Similar to the OP, I'm encountering a similar issue on an execution node hosted on RHEL servers. It has been a moment since we deployed AWX in production, and this is the first time we've experienced an issue with the execution nodes. Problem descriptionI have many error in my
It's been awhile, and those errors never cause some issue but know we have more jobs running on AWX now. After a moment, I experience a timeout between the
I can't figure out what could be causing this timeout. When my execution node switch from 'ready' to 'unavailable' state,
At this moment, my only workaround is to restart the I've already checked some things:
@kurokobo or someone else, do you have an idea please ? I'm running out of idea here ... Additional informationExecution node VM information :
AWX information:
Receptor information
Ansible-runner version
Podman information:
|
The issue is not related to : ERROR 2024/01/11 09:46:21 Error locating unit: IbVMji5u
ERROR 2024/01/11 09:46:21 : unknown work unit IbVMji5u I tried to disable the cleanup from AWX and do it on my side and I don't have this error anymore but my execution node continue to timeout randomly. |
Hi I have the same issue but it is following red hat update on the execution node from 8.8 to latest 8.8 kernel No solution ? Can you advise @koro Thanks for your support |
Similar topic: #934 Could anyone here who facing this issue share your |
+1, have the same issue. |
I ran into this on Friday. My last job was one with id
The failed job ran on aap-1 and I see this in the messages at about that time:
However this is not the only instance of that error. Please find attached the logs:
Please accept my apologies for the fact that some log lines are duplicated in the AWX task logs, this is because I can only download them 500 messages at a time from Google's logging console. AWX is running inside a Google Kubernetes Engine cluster while aap-0 and app-1 are running on RHEL 9 VMs inside Google compute engine. Here is a screen clip of the topology screen screen for my cluster per comment from @kurokobo : ^^^ Note that this is after restarting the awx-task deployment so the awx task node has changed its id. The podman version on aap-1 is 3.4.4. I wonder if I upgrade to something where containers/conmon#440 had been fixed, I wouldn't see this again? |
I've had similar issues and downgrading receptor to version 1.4.2 seems to solve it somehow. |
Recently we see one of our receptor node showing unavailable on AWX and we see the error below
Receptor error from XX.XX.XX.XX, detail:
Work unit expired on Mon Oct 30 12:04:34
Restart of the receptor service did not fix the issue. Any idea on what is causing this ?
The text was updated successfully, but these errors were encountered: