-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for maximum size for Output of checks #5233
Support for maximum size for Output of checks #5233
Conversation
I can say that we have suffered from big health-checks. We had to use the "args" argument and wrap the health-check with a script. A more native way of reducing the output would have made a big difference while waiting for developers to change or add a param to their endpoint. |
This PR will allow administrator to limit the size of Output produced by healthchecks, when set at agent level, it will be the maximum output for all healthchecks produced by agents. It will also allow specific healthchecks to go below the limit set by administrator, OutputMaxSize if set in check specification will now allow to go beyond agent level setting, but will allow to reduce it further. We had the following configuration at agent level * check_output_max_size (default: 4k) : max number for all checks of the agent * check.OutputMaxSize: (default: 4k)
3a9cf1b
to
b5717aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Pierre,
Thanks for the PR, I have some comments inline.
Additionally, the website's docs should also be updated to reflect these changes.
You can add the CLI and config docs to: website/source/docs/agent/options.html.md
, and the check docs to: website/source/api/agent/check.html.md
.
if maxOutputSize == 0 { | ||
maxOutputSize = 4096 | ||
} | ||
if chkType.OutputMaxSize > 0 && maxOutputSize > chkType.OutputMaxSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mention in your description, this will only override the agent check if the check's OutputMaxSize
is less than the agent's.
Could you expand more on why we shouldn't allow the check's max to override the agent's max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddygv I wanted to be as conservative as possible (meaning, that for instance, if someone is storing those results in a DB and assumed size will never be more than 4k, we won't break this assumption), but allowing large infrastructure to reduce the size of their checks since it might be hugely impacting when checks are varying a lot (for instance, the normal result if checks if 'OK', while the broken checks returns large stack traces), ask @orarnon for instance who had this issue (we did had it as well on very large clusters in the past)
Increasing this value is trivial and could be done in another patch if someone is resquesting it, but if you really want, we can remove this limit in this PR as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierresouchay Thanks. Can you please expand on the need for the check-specific max output size? Would adding a max for the agent alone be sufficient?
The implementation would be simpler if we only set this max once. Additionally, there is a UX mismatch where someone can discard output at the agent level, but at the check level the max size can only be brought down to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the reason for that is serialization of payload: if the user specify 0, it means no value has been specified in payload (for instance for a user not setting OutputMaxSize
in the check definition and thus preserving backward compatibility.
As you pointed out in a previous comment, if the user completely want to disable checks output, it can be done by disabling completely check output.
* Renamed `checks.BufSize` to `checks.DefaultBufSize` * Use the constant where appropriate * Better error message when `check_output_max_size` has wrong value
@freddygv all DONE |
1e957d1
to
8dcf910
Compare
@pierresouchay should be fixed now, let us know if it happens again. Will send back some more comments about this PR tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierresouchay Thanks for the updates. Just added some more comments inline.
Also, I am not familiar with the output of gRPC health checks, but I'm seeing that here the resulting err.Error()
isn't being truncated based on the check output max size. It seems like there should also be a truncation here since the error output could be large.
@@ -117,6 +117,14 @@ will exit with an error at startup. | |||
[go-sockaddr](https://godoc.org/github.com/hashicorp/go-sockaddr/template) | |||
template | |||
|
|||
* <a name="check_output_max_size"></a><a href="#check_output_max_size">`-check_output_max_size</a> - | |||
Override the default limit of 4k for maximum size of checks, this is a value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this description could be simplified further to something like:
The maximum output size for health checks. Limiting check output size reduces pressure on Consul servers when many checks are producing large outputs. Defaults to 4096, and must be at least 1. To disable check output capture use
discard_check_output
.
@@ -117,6 +117,14 @@ will exit with an error at startup. | |||
[go-sockaddr](https://godoc.org/github.com/hashicorp/go-sockaddr/template) | |||
template | |||
|
|||
* <a name="check_output_max_size"></a><a href="#check_output_max_size">`-check_output_max_size</a> - | |||
Override the default limit of 4k for maximum size of checks, this is a value | |||
between 1 and 4096, setting it to 0 is not allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing in the code a constraint that restricts max size to be <= 4096. This PR only checks the lower bound, which I think should be OK.
@@ -2239,6 +2243,13 @@ func (a *Agent) addCheck(check *structs.HealthCheck, chkType *structs.CheckType, | |||
|
|||
// Check if already registered | |||
if chkType != nil { | |||
maxOutputSize := a.config.CheckOutputMaxSize | |||
if maxOutputSize == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be unnecessary. How could a.config.CheckOutputMaxSize
end up as 0? The config builder has a default value of 4096 and returns an error if the input is < 1.
if maxOutputSize == 0 { | ||
maxOutputSize = 4096 | ||
} | ||
if chkType.OutputMaxSize > 0 && maxOutputSize > chkType.OutputMaxSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierresouchay Thanks. Can you please expand on the need for the check-specific max output size? Would adding a max for the agent alone be sufficient?
The implementation would be simpler if we only set this max once. Additionally, there is a UX mismatch where someone can discard output at the agent level, but at the check level the max size can only be brought down to 1.
@pierresouchay I just looked at your PR again and I think there are some comments left. |
suggestion from @freddygv Co-Authored-By: Freddy <freddygv@users.noreply.github.com>
@pierresouchay Thanks for addressing those comments, this is almost ready. The last portion is to update a test around the TTL check update, so that CI can pass. The truncation behavior was moved from So the sub-test TestAgent_UpdateCheck/log_output_limit can be removed and that behavior should be tested here instead. |
Check that OutputMaxSize is properly taken into account. BUGFIX: properly update local state on update of TTL
bc82a29
to
acce4eb
Compare
bf00400
to
0eebd94
Compare
0eebd94
to
8478180
Compare
Looks good, thanks for working through those changes! Regarding the issue you referenced about syncChanges, it does seem like they're right, but I'd have to dig deeper to make sure it can be removed from the API endpoints. |
@freddygv thank you! |
This PR will allow administrator to limit the size of Output produced by healthchecks,
when set at agent level, it will be the maximum output for all healthchecks produced
by agents.
It will also allow specific healthchecks to go below the limit set by administrator,
OutputMaxSize if set in check specification will now allow to go beyond agent level
setting, but will allow to reduce it further.
We had the following configuration at agent level