Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GC fileset to Elasticsearch Filebeat module #7305

Merged
merged 16 commits into from
Jun 18, 2018
Merged

Add GC fileset to Elasticsearch Filebeat module #7305

merged 16 commits into from
Jun 18, 2018

Conversation

inqueue
Copy link
Member

@inqueue inqueue commented Jun 12, 2018

Adding fileset JVM GC logs for Elasticsearch Filebeat module. This baseline fileset does not do much yet, more to come.

#5301.

Update ingest pipeline

Add Java 8 GC log

JDK -> JVM logs

Add default gc.log path

Add jvm fields group; GC test log update

Add test data
@inqueue
Copy link
Member Author

inqueue commented Jun 13, 2018

Inadvertently left out the multiline config.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for kicking this off.

I'm still confused if this is a JDK or JVM log. One more general thought: Is the gc log the same independent on where it comes from? Meaning should we perhaps have a jvm module with a gc metricset`?

What are log files are there from the JVM?

Ignoring the above for now: Happy to have it directly in ES. If we have it in, from a user perspective it is probably more the gc metricset as he does not think too much about where it comes from but what it's about.

@@ -0,0 +1,702 @@
2018-06-11 01:51:14 GC log file created /var/log/elasticsearch/gc.log.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this log contains jdk in the name I'm now thinking if this is actually a jdk log and not jvm. I'm confused :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log came from a running JDK (as opposed to JRE) and should be renamed to gc.log.

@ruflin
Copy link
Contributor

ruflin commented Jun 13, 2018

Note: Let's discuss the above quickly but will also have a second review on the metricset itself so we can get CI green.

@ruflin
Copy link
Contributor

ruflin commented Jun 13, 2018

@inqueue I opened #7309 to better adhere to ECS and create an example on how to handle the timestamps and how the service field could be added.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments.

For the CI failure need to check it out. I assume there is a very minimal diff for the expected log which is hard to spot.

"field": "message",
"patterns": [
"%{TIMESTAMP_ISO8601:timestamp}",
"%{GREEDYMULTILINE:elasticsearch.gc.message}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to put this under just message. It means we will overwrite what is in message at the moment. If we want to keep it, we could use log.message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the missing elasticsearch.gc.message part in the expected log file is the reason CI is failing.

@@ -1,2 +1,4 @@
dashboards:

- id: Filebeat-elasticsearch-gc-Dashboard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed as there are no dashboards yet.

var:
- name: paths
default:
- /var/log/elasticsearch/gc.log.[0-9]*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the name of the first log? If it is called gc.log I think the pattern will not match it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC logs are weird. In ES jvm.options, the default configuration looks like this:

8:-Xloggc:/var/log/elasticsearch/gc.log

When Elasticsearch is started for the first time without any GC logs, the first one is named gc.log.0.current. When rolled over by the JVM, the file is renamed to gc.log.0 and gc.log.1.current begins. Here are files from a running cluster node:

ls -1 /var/log/elasticsearch/gc.*
/var/log/elasticsearch/gc.log.0
/var/log/elasticsearch/gc.log.0.current
/var/log/elasticsearch/gc.log.1
/var/log/elasticsearch/gc.log.2
/var/log/elasticsearch/gc.log.3
/var/log/elasticsearch/gc.log.4.current

On this system, the active file is /var/log/elasticsearch/gc.log.0.current. /var/log/elasticsearch/gc.log.4.current was left behind from a restart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, then the pattern makes sense.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments. In think the fields part needs fixing besides that I'm good with merging and fixing things in a follow up PR.

type: group
description: >
fields:
- name: relative_process_timestamp_secs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should adhere to our naming conventions: https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html We can also tackle this in a follow up PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can fix on a follow up PR.

negate: true
match: after

fields:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the indentation seems to be off fields and fields_under_root is probably not applying because of this.

"grok": {
"field": "message",
"patterns": [
"%{GCTIMESTAMP}: %{GCPROCRUNTIME}: Total time for which application threads were stopped: %{BASE10NUM:elasticsearch.gc.threads_total_stop_time_secs} seconds, Stopping threads took: %{BASE10NUM:elasticsearch.gc.stopping_threads_time_secs} seconds",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph Looks like we could this in dissect?

var:
- name: paths
default:
- /var/log/elasticsearch/gc.log.[0-9]*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, then the pattern makes sense.

@inqueue
Copy link
Member Author

inqueue commented Jun 16, 2018

Fixed two indentation issues @ruflin. Should be good?

@ruflin ruflin merged commit 8ca64fc into elastic:master Jun 18, 2018
@ruflin
Copy link
Contributor

ruflin commented Jun 18, 2018

@inqueue Thanks a lot for the PR. Now lets start to iterate on top of it :-) Really happy to have this in.

@ruflin ruflin changed the title Create jvm fileset Add GC fileset to Elasticsearch Filebeat module Jun 20, 2018
@tsg tsg mentioned this pull request Jun 20, 2018
20 tasks
--
type: float

Garbage collection threads total stop time seconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inqueue Just one clarification on the meaning of those: The metrics that are captured here are not necessarily tied to garbage collection. These times are actually related to safepoints. At a safepoint all application threads are halted and the JVM can do any administrative work that requires that the application cannot interfere. That might include such things as deoptimization, biased locking or garbage collection.

The first metric tells for how long the application threads have been halted (the duration of the pause due to the safepoint) and the second one tells how long it took from the point in time the JVM requested a safepoint until all application threads could been halted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielmitterdorfer, that makes sense. I did want to capture that metric in this initial commit as it is probably the most frequently logged in gc.log. Can we work with you to isolate the most meaningful metrics? We also intend to incorporate GC metrics captured in the Kafka Filebeat module- https://www.elastic.co/blog/monitoring-kafka-with-elastic-stack-1-filebeat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we work with you to isolate the most meaningful metrics?

Yes. The most meaningful ones in my opinion are:

  • Absolute timestamp of an event
  • Type of pause (young GC, old GC)
  • Pause time (how long was the application stopped?), GC (phase) duration (might happen concurrently)
  • Amount of memory cleaned up

Although I think it will get very tricky to parse all those variations:

  • The log file format has changed with JDK 9 (so called "unified logging").
  • There are different garbage collection algorithms (e.g. CMS, G1)

https://www.youtube.com/watch?v=M9AYK5PtX40 should give a good idea about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmitterdorfer Thanks a lot for chiming in here.

What is going to be really valuable for our users are the level of detail you described the metrics above. We should make sure these end up in our docs as it happens very frequently on discuss or support that users ask about what a field means in detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome. Glad I could help.

@inqueue inqueue deleted the jvm-fileset branch June 23, 2018 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants