-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(log): Some improvements to the logs #537
Conversation
Hey, I've taken a look at your PR. I've went ahead with reviewing this as part of the handover effort started this morning. I'll let Olu do a final review as I don't know much about this code base.
$ export HTTPS_PROXY=foo
$ go run . agent -c config.yaml --client-id XXXXX -k /tmp/key --venafi-cloud -p 5s 2>&1 | grep api.venafi.cloud
2024/05/14 14:26:57 Posting data to: https://api.venafi.cloud/
2024/05/14 14:26:58 retrying in 41.388139367s after error: post to server failed: failed to execute http request to VaaS. Request https://api.venafi.cloud/v1/oauth/token/serviceaccount, status code: 400, body: [{"error":"invalid_grant","error_description":"token_signature_verification_error"}
2024/05/14 14:27:39 Posting data to: https://api.venafi.cloud/
2024/05/14 14:27:40 retrying in 35.360750201s after error: post to server failed: failed to execute http request to VaaS. Request https://api.venafi.cloud/v1/oauth/token/serviceaccount, status code: 400, body: [{"error":"invalid_grant","error_description":"token_signature_verification_error"}
2024/05/14 14:28:15 Posting data to: https://api.venafi.cloud/
2024/05/14 14:28:15 retrying in 1m38.081853176s after error: post to server failed: failed to execute http request to VaaS. Request https://api.venafi.cloud/v1/oauth/token/serviceaccount, status code: 400, body: [{"error":"invalid_grant","error_description":"token_signature_verification_error"}
2024/05/14 14:29:53 Posting data to: https://api.venafi.cloud/
2024/05/14 14:29:54 retrying in 1m37.808502147s after error: post to server failed: failed to execute http request to VaaS. Request https://api.venafi.cloud/v1/oauth/token/serviceaccount, status code: 400, body: [{"error":"invalid_grant","error_description":"token_signature_verification_error"}
2024/05/14 14:31:31 Posting data to: https://api.venafi.cloud/
2024/05/14 14:31:32 retrying in 3m17.774884905s after error: post to server failed: failed to execute http request to VaaS. Request https://api.venafi.cloud/v1/oauth/token/serviceaccount, status code: 400, body: [{"error":"invalid_grant","error_description":"token_signature_verification_error"}
2024/05/14 14:34:50 Posting data to: https://api.venafi.clou
2024/05/14 14:34:50 retrying in 3m34.920269392s after error: post to server failed: failed to execute http request to VaaS. Request https://api.venafi.cloud/v1/oauth/token/serviceaccount, status code: 400, body: [{"error":"invalid_grant","error_description":"token_signature_verification_error"}
2024/05/14 14:38:25 Posting data to: https://api.venafi.cloud/
2024/05/14 14:38:25 Exiting due to fatal error uploading: post to server failed: failed to execute http request to VaaS. Request https://api.venafi.cloud/v1/oauth/token/serviceaccount, status code: 400, body: [{"error":"invalid_grant","error_description":"token_signature_verification_error"} (I've hidden all the useless messages about missing resources) About the REST config bug, well spotted. I wasn't able to reproduce the panic (if that's how you spotted this) Anyways, I think all the changes you made are sensible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just added a couple of minor suggestions.
We should think about structuring these logs and converting them into events soon 🤔
Thanks!
I believe the default timeout is 15m before it gives up.
It was. Running locally without a kubeconfig file should reproduce.
Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
* Log a more informative error message when giving up on uploading readings to the server. This would be the last message before the pod exits, so if the pod ends up in CrashLoopBackoff it's important to highlight this as the reason. * Remove the "Running Agent" log message, it's not clear what it means * Put [] around the body as if the body is empty the log message can read like the following message is the body. ``` 2024/05/10 16:33:43 retrying in 25.555756126s after error: received response with status code 404. Body: W0510 16:33:43.832278 10875 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list route.openshift.io/v1, Resource=routes: the server could not find the requested resource ``` * Remove "using parent stop channel" from a log message as it's not clear what it means. It used to be conditional, and now it isn't, so it's not providing any information, and it's confusing as to what it might be trying to tell you if you aren't familiar with that code.
`discoveryClient` is uninitialized, so the pointer can't be deferenced, so return an empty struct if there's an error.
The informer context is created, but not passed to anything, so this code doesn't do anything.
This returns the number of items collected by each datagatherer so that the logs tell us a bit more about what was found in the cluster, and can help find where any items have been missed.
Thanks all, I pushed an update with the suggested changes. |
Thank you 👍 LGTM, I'll merge the changes to master |
The current logs are somewhat confusing, and missing some very useful information. This is an attempt
to clean it up in a few ways:
Log the number of items from each dg. Each datagatherer can return
the number of items it collected so that the logs tell us a bit more about
what was found in the cluster, and can help find where any items have
been missed.
Log a more informative error message when giving up on uploading
readings to the server. This would be the last message before
the pod exits, so if the pod ends up in CrashLoopBackoff it's
important to highlight this as the reason.
Remove the "Running Agent" log message, it's not clear what it means
Put [] around the body as if the body is empty the log message
can read like the following message is the body.
clear what it means. It used to be conditional, and now it isn't,
so it's not providing any information, and it's confusing as to
what it might be trying to tell you if you aren't familiar with that
code.
There are couple of code cleanups included as well:
Fix a crash bug when REST config can't be loaded.
discoveryClient
is uninitialized, so the pointer can't be deferenced,so return an empty struct if there's an error.
Remove the informer context in the dynamic dg. The informer
context is created, but not passed to anything, so this code doesn't do anything.
The changes are split into a few logical commits if we want to break it up and review them independently,
as I know these might not be the best approaches for all of these things.