Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes parsing of GC entries in elasticsearch server log #9603

Merged
merged 11 commits into from
Dec 27, 2018

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Dec 18, 2018

Resolves #9513.

This PR:

  • removes the incorrectly-parsed gc_overhead field. Turns out what we were parsing was actually an insignificant sequential number, not GC overhead,
  • parses out a new gc.collection_duration field, e.g. 1.2s, which is the time spent performing GC, and
  • parses out a new gc.observation_duration field, e.g. 1.8s, which is the overall time over which GC collection was performed

It also splits up the long grok expression in the ingest pipeline into smaller patterns and references those patterns, hopefully for easier readability.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring

@ruflin
Copy link
Contributor

ruflin commented Dec 18, 2018

@ycombinator Could you rebase on master. The failing CI should be fixed there.

@ycombinator
Copy link
Contributor Author

@ruflin Rebased on master now.

@ycombinator
Copy link
Contributor Author

@radoondas has suggested a few improvements so I'm taking this PR out of review for now. I will put it back into review after I've incorporated his improvements.

@ycombinator
Copy link
Contributor Author

Thanks to @radoondas's work, the ingest pipeline in this PR now includes a script processor to parse the GC durations into meaningful numeric millisecond values. It also further breaks down the grok expressions to make them even more readable.

@tsouza
Copy link

tsouza commented Dec 18, 2018

@ycombinator you've described the metrics as Old GC. According to https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/monitor/jvm/JvmGcMonitorService.java#L458 it's not only Old GC. It's the overhead of all the GCs.

@ycombinator
Copy link
Contributor Author

ycombinator commented Dec 18, 2018

Thanks @tsouza. Fixing both those things...

[UPDATE] Leaving the sequence number as-is per our conversation off-PR.

@ycombinator
Copy link
Contributor Author

I've incorporated the latest feedback into this PR. It is ready for review again. Thanks!

Copy link
Member

@radoondas radoondas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks now good to go. It does not break BC and build fields for future use.
Base of the changes - breaking the pattern - will help to build new special requests from user base.

{
"script": {
"lang": "painless",
"source": "if (ctx.elasticsearch.server.gc != null && ctx.elasticsearch.server.gc.observation_duration != null) { if (ctx.elasticsearch.server.gc.observation_duration.unit == 's') { ctx.elasticsearch.server.gc.observation_duration.time_in_millis = ctx.elasticsearch.server.gc.observation_duration.time * 1000;}if (ctx.elasticsearch.server.gc.observation_duration.unit == 'ms') { ctx.elasticsearch.server.gc.observation_duration.time_in_millis = ctx.elasticsearch.server.gc.observation_duration.time; } if (ctx.elasticsearch.server.gc.observation_duration.unit == 'm') { ctx.elasticsearch.server.gc.observation_duration.time_in_millis = ctx.elasticsearch.server.gc.observation_duration.time * 60000; }} if (ctx.elasticsearch.server.gc != null && ctx.elasticsearch.server.gc.collection_duration != null) { if (ctx.elasticsearch.server.gc.collection_duration.unit == 's') { ctx.elasticsearch.server.gc.collection_duration.time_in_millis = ctx.elasticsearch.server.gc.collection_duration.time * 1000;} if (ctx.elasticsearch.server.gc.collection_duration.unit == 'ms') {ctx.elasticsearch.server.gc.collection_duration.time_in_millis = ctx.elasticsearch.server.gc.collection_duration.time; } if (ctx.elasticsearch.server.gc.collection_duration.unit == 'm') { ctx.elasticsearch.server.gc.collection_duration.time_in_millis = ctx.elasticsearch.server.gc.collection_duration.time * 60000; }}"
Copy link
Contributor

@ruflin ruflin Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz Great use case for a duration type (mostly) ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@ruflin
Copy link
Contributor

ruflin commented Dec 20, 2018

@ycombinator I wonder if an improvement can be made later on on the ES side to always log in the same unit?

@ycombinator
Copy link
Contributor Author

ycombinator commented Dec 20, 2018

I wonder if an improvement can be made later on on the ES side to always log in the same unit?

@ruflin Added elastic/elasticsearch#32850 (comment) elastic/elasticsearch#36896.

I've also spoken with @jakelandis about the possibility of adding a duration Ingest processor, similar to the bytes one we have today for performing the necessary parsing and conversion from string durations to numeric durations.

@ycombinator ycombinator requested review from jsoriano and removed request for monicasarbu December 20, 2018 14:26
filebeat/docs/fields.asciidoc Outdated Show resolved Hide resolved
filebeat/docs/fields.asciidoc Outdated Show resolved Hide resolved
filebeat/module/elasticsearch/server/ingest/pipeline.json Outdated Show resolved Hide resolved
@ycombinator
Copy link
Contributor Author

@jsoriano Thanks for your review. I've addressed all your feedback now. Ready for your 👀 again!

Additionally I've also shortened the painless expression a bit by using the null-safe operator.

Copy link

@tsouza tsouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone for fixing this!

LGTM

@ycombinator
Copy link
Contributor Author

jenkins, test this

@ycombinator ycombinator merged commit 11a1917 into elastic:master Dec 27, 2018
@ycombinator ycombinator added needs_backport PR is waiting to be backported to other branches. v6.7.0 v7.0.0-alpha2 labels Dec 27, 2018
@ycombinator ycombinator removed the needs_backport PR is waiting to be backported to other branches. label Dec 27, 2018
ycombinator added a commit that referenced this pull request Dec 28, 2018
I missed adding CHANGELOG entries in #9603 so this PR follows up with said entries. No need to backport as the CHANGELOG entries were added in the original backport PR: #9810.
ycombinator added a commit that referenced this pull request Jan 8, 2019
…h server log (#9810)

* Regenerating fields

* Fixing up for 6.x

* Adding CHANGELOG entries
@ycombinator ycombinator deleted the bugfix-fb-es-gc branch December 25, 2019 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants