Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't error if log reconnection fails due to context canceled #875

Merged
merged 3 commits into from
Oct 13, 2023

Conversation

willthames
Copy link
Contributor

Don't retry either as the context won't become uncanceled

Don't retry either as the context won't become uncanceled
@willthames
Copy link
Contributor Author

Logs from stern for the awx-task container when this happens (note the EOF on the container shortly before the context is canceled, making me think that something sees the pod fail and cancels the context):

awx-task-5ddc6d6655-ppj4g awx-task 2023-10-11 15:17:25,053 INFO     [4d7343ec16a94b44bb354ffa5e134eee] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 70831
awx-task-5ddc6d6655-ppj4g awx-task 2023-10-11 15:17:25,103 INFO     [4d7343ec16a94b44bb354ffa5e134eee] awx.analytics.job_lifecycle inventoryupdate-70831 stats wrapup finished
awx-task-5ddc6d6655-ppj4g awx-task 2023-10-11 15:17:25,112 INFO     [4d7343ec16a94b44bb354ffa5e134eee] awx.analytics.job_lifecycle inventoryupdate-70831 post run
awx-task-5ddc6d6655-ppj4g awx-task 2023-10-11 15:17:27,423 ERROR    [4d7343ec16a94b44bb354ffa5e134eee] awx.main.dispatch inventory_update 70831 (failed) is no longer running; reaping
awx-task-5ddc6d6655-ppj4g awx-ee INFO 2023/10/11 15:17:25 [mHRYJQDE] Detected Error: context canceled for pod awx/automation-job-70831-fwzfh. Will retry 5 more times.
awx-task-5ddc6d6655-ppj4g awx-ee WARNING 2023/10/11 15:17:25 [mHRYJQDE] Error getting pod awx/automation-job-70831-fwzfh. Will retry 5 more times. Error: client rate limiter Wait returned an error: context canceled
awx-task-5ddc6d6655-ppj4g awx-ee WARNING 2023/10/11 15:17:26 [mHRYJQDE] Error getting pod awx/automation-job-70831-fwzfh. Will retry 4 more times. Error: client rate limiter Wait returned an error: context canceled
awx-task-5ddc6d6655-ppj4g awx-ee WARNING 2023/10/11 15:17:27 [mHRYJQDE] Error getting pod awx/automation-job-70831-fwzfh. Will retry 3 more times. Error: client rate limiter Wait returned an error: context canceled
awx-task-5ddc6d6655-ppj4g awx-ee WARNING 2023/10/11 15:17:28 [mHRYJQDE] Error getting pod awx/automation-job-70831-fwzfh. Will retry 2 more times. Error: client rate limiter Wait returned an error: context canceled
awx-task-5ddc6d6655-ppj4g awx-ee WARNING 2023/10/11 15:17:29 [mHRYJQDE] Error getting pod awx/automation-job-70831-fwzfh. Will retry 1 more times. Error: client rate limiter Wait returned an error: context canceled
awx-task-5ddc6d6655-ppj4g awx-ee ERROR 2023/10/11 15:17:30 [mHRYJQDE] Error getting pod awx/automation-job-70831-fwzfh. Error: client rate limiter Wait returned an error: context canceled

@willthames
Copy link
Contributor Author

This is not necessarily the best fix (maybe not canceling the context before receptor has marked the pod successful will help) but we've already seen it avoid the inventory update being marked as failed even after it's successfully completed (according to the update pod logs)

@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Merging #875 (8becd14) into devel (5c4dfac) will increase coverage by 0.03%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##            devel     #875      +/-   ##
==========================================
+ Coverage   37.60%   37.63%   +0.03%     
==========================================
  Files          44       44              
  Lines        8568     8576       +8     
==========================================
+ Hits         3222     3228       +6     
- Misses       5102     5105       +3     
+ Partials      244      243       -1     
Files Coverage Δ
pkg/workceptor/kubernetes.go 2.97% <0.00%> (-0.03%) ⬇️

... and 4 files with indirect coverage changes

@sonarcloud
Copy link

sonarcloud bot commented Oct 13, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Contributor

@AaronH88 AaronH88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@AaronH88 AaronH88 merged commit 63ee265 into ansible:devel Oct 13, 2023
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants