Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller logging improvements #1759

Merged
merged 1 commit into from
Aug 30, 2021
Merged

Conversation

matthchr
Copy link
Member

What this PR does / why we need it:
We weren't consistent with what we were logging at what level. Some log level 0 messages were getting emitted every time through the reconcile loop, others were much less verbose. When writing a log statement, it was hard to determine where in the 0 to 10 spectrum to put your log without crawling all of the other log statements around you to rebuild your mental model of what levels we were using.

There is some relatively standard guidance from Kubernetes, but the problem is remembering it, especially given that they have 6 levels discussed there many of which are just described as "More details" which of course begs the question: "When do I choose "details" and when do I choose "more details""

The solution I came up with was to:

  • Name some log levels and use the names. Hopefully the names are sorta descriptive.
  • Provide documentation/examples that are specific to our use case (authoring a controller) on each named level for the kinds of logs that should go into that level.

I'm open to changing the log level names or even doing something else entirely, as long as it solves the fundamental problem of helping us keep our logs consistent.

I also set the log level in the container to 1 (I believe the default is 0) so that if we have issues escalated to us we have a bit more.

@@ -25,6 +25,7 @@ spec:
containers:
- args:
- --enable-leader-election
- --v=1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set this higher?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in Alpha, yes - we kind of expect issues. Change it back to 1 when we hit Beta, maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put this up to 2 for now. That means we're not logging full API requests/responses (which we get at 3) but we're logging everything else.

Avoiding full requests/responses (except in our test logs for now) seems prudent given they could theoretically contain secrets at the moment since we don't do any pruning AND we don't put secrets into their own Kubernetes secrets object. We'll need to do something about that - I've got #1762 filed tracking that.

@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2021

Codecov Report

Merging #1759 (2deb4c0) into master (a82afe8) will decrease coverage by 0.00%.
The diff coverage is 79.31%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1759      +/-   ##
==========================================
- Coverage   42.04%   42.04%   -0.01%     
==========================================
  Files         311      311              
  Lines       89943    89944       +1     
==========================================
- Hits        37821    37819       -2     
- Misses      48785    48788       +3     
  Partials     3337     3337              
Impacted Files Coverage Δ
...ted/pkg/reconcilers/azure_deployment_reconciler.go 66.15% <77.77%> (-0.60%) ⬇️
hack/generated/controllers/generic_controller.go 78.37% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a82afe8...2deb4c0. Read the comment docs.

Copy link
Member

@theunrepentantgeek theunrepentantgeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good idea.

I have a couple nitpicks on levels for some messages, but otherwise looks great.

gif

@@ -369,8 +369,6 @@ func (r *AzureDeploymentReconciler) DetermineCreateOrUpdateAction() (CreateOrUpd
}

if hasOngoingDeployment {
r.log.V(3).Info("Have existing deployment ID, will monitor it", "deploymentID", ongoingDeploymentID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something stalls and never makes progress, this log line might be the key to spotting the situation.
Suggest restoring this, but at Verbose level.

Copy link
Member Author

@matthchr matthchr Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because we actually already logged this above on line 355 pretty much, so this line was duplicate. Actually the above line is better because it also includes the current Ready condition and hasChanged.

@@ -25,6 +25,7 @@ spec:
containers:
- args:
- --enable-leader-election
- --v=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in Alpha, yes - we kind of expect issues. Change it back to 1 when we hit Beta, maybe?

@matthchr matthchr merged commit b87d92c into Azure:master Aug 30, 2021
@matthchr matthchr deleted the feature/logging branch August 30, 2021 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants