-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count 429s in elasticsearch output #8056
Conversation
Open to suggestions on how to test this. |
2f73a5c
to
f8dad25
Compare
@graphaelli I am +1 to track 429 at the output level, Concerning testing sadly I don't think we test any of the stats at the client level. I would probably take the following strategy:
|
I think it would be super helpful to have these values also in stack monitoring. If you agree, could you open a meta issue here? https://github.com/elastic/stack-monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @ruflin you want to take a look since you were commenting?
stats.nonIndexable++ | ||
continue | ||
if status < 500 { | ||
if status == http.StatusTooManyRequests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad I am not the only one that use these constants :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please make sure to file a follow up issue in stack-monitoring so we have it in Elasticsearch templates and the UI.
As a libbeat user, I'd like more help tracking down when elasticsearch is the bottleneck in event ingestion and could use configuration tuning or more nodes. A rise in 429s can provide some indication of this situation.
This change adds a new observable method
ErrTooMany
and corresponding monitoring metricoutput.events.toomany
that is only reported by the elasticsearch output.Eventually, a distribution of time spent in queue (ack time - receive time) would be useful across more outputs, but this is a start.