Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging changes to support Beats Elasticsearch module (including logging to JSON) #32850

Closed
4 tasks
jasontedor opened this issue Aug 14, 2018 · 42 comments · Fixed by #36833
Closed
4 tasks

Logging changes to support Beats Elasticsearch module (including logging to JSON) #32850

jasontedor opened this issue Aug 14, 2018 · 42 comments · Fixed by #36833
Labels

Comments

@jasontedor
Copy link
Member

This relates to the effort in Beats (filebeat) to build a module for Elasticsearch. To support this effort, we want to make some changes to the logs in Elasticsearch including removing some existing pain points. The main highlight of these changes will be shipping with JSON logging.

  • Change the name of the main log file to something like server.log. Today the main log file is <clustername>.log and this creates complexity for Filebeat; it would be simpler for them if they could rely on an include pattern based on a predictable log file name (another possibility here is server-<clustername>.log). Note that this will be a breaking change.
  • Each log line should contain the node ID and cluster ID (machine friendly, and serve as unique identifiers), and node name and cluster name (human friendly). Filebeat wants to see these for instances where logs are being collected to a single sink from multiple nodes and multiple clusters.
  • Improve logging in the official Docker images. Logging in Docker images traditionally goes to standard output and standard error. We do this with our official Docker images today, but then there is no way to distinguish the server logs from the audit, deprecation, and slow logs. With a single output stream, we need to add a "type" field to the log outputs from the Docker container so the different sources can be parsed from the single standard output stream.
  • Enable JSON layout. Today we do not support layout, mainly due to security manager issues. We need to address these. With this effort, we want the logs to remain human friendly, so we will also want to define an ordering of the first few fields in the JSON object (something like timestamp, log_level, component, node_name, index_name, shard_id, message).
@jasontedor jasontedor added >enhancement :Core/Infra/Logging Log management and logging utilities v7.0.0 labels Aug 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@jasontedor
Copy link
Member Author

Pinging @ruflin; let me know if I missed anything!

@sshling
Copy link
Contributor

sshling commented Aug 16, 2018

In some scenarios, an exception occurs, a large amount of logs will be printed, tens of MB per second, if it can be optimized, the same type exception is suppressed ,as print info :

"2016-07-09 11:01:31 +0430 [warn]: suppressed same stacktrace"

A large number of exception log writes, seriously affecting disk write performance.

@ycombinator
Copy link
Contributor

Now that we have a Logging UI in Kibana, we were hoping to show ES deprecation (and potentially other types of) logs in the UI. The user would discover said logs via the Monitoring UI, filtered down to the context they are in: a cluster, a node within a cluster, etc. In order for this to work correctly, item 2 in the checklist above needs to be implemented:

Each log line should contain the node ID and cluster ID (machine friendly, and serve as unique identifiers), and node name and cluster name (human friendly). Filebeat wants to see these for instances where logs are being collected to a single sink from multiple nodes and multiple clusters.

Any chance this specific item could be prioritized to enable the Monitoring UI use case described above?

@jasontedor
Copy link
Member Author

Any chance this specific item could be prioritized to enable the Monitoring UI use case described above?

We have been meaning to follow-up on this requirement, because as written we can’t meet it. We need to understand more how important and how strong this requirement is. It is simply not possible include the node ID and cluster ID on every log line. We emit log lines before these data are available.

@ycombinator
Copy link
Contributor

We emit log lines before these data are available.

Is this true for all types of logs? For example, would it be possible to emit the node ID and cluster ID on every log line for deprecation logs?

Are there any guarantees about when the data becomes available or, conversely, what types of log messages might not have this data in them since it's not available yet?

@jasontedor
Copy link
Member Author

Yes, it could be true for the deprecation log too. Also, it would be that we are considering combining all the log files into a single log file for all instances, not only for the Docker images.

Are there any guarantees about when the data becomes available or, conversely, what types of log messages might not have this data in them since it's not available yet?

There aren't guarantees per se. They can be categorized as all startup log messages up to the point that we recover node state would not have the node ID, and all startup log messages up to the point that we recover cluster state would not have the cluster ID (superset of those without the node ID).

@ycombinator
Copy link
Contributor

Thanks Jason. Roughly how many log entries are we talking about without cluster UUID or node ID? The reason I ask is: thinking a bit out of the box, would it be silly to re-emit those messages with the IDs in them, once the IDs become available? Just want to consider all possibilities before we say it's okay to "drop" certain messages that don't have IDs.

@jasontedor
Copy link
Member Author

Roughly how many log entries are we talking about without cluster UUID or node ID?

I can not bound it because we can always add more in the future, and if debug or trace logging is enabled (or even only for certain components) it can be quite a lot of log messages.

@ycombinator
Copy link
Contributor

In that case, I'm in favor of doing a best effort here. That is, I'm in favor of ES including the cluster UUID and node ID on every log line as and when that information becomes available.

It would be an improvement over what we can do today in the UI with the logs in terms of correlating them with other information about the node or cluster. And we'd still have the un-correlatable logs (but fewer in number now) around in filebeat indices, in case we later think of some way to display these meaningfully to the user.

@pgomulka
Copy link
Contributor

pgomulka commented Dec 19, 2018

I opened a draft PR to discuss the technical details of the implementation. Feel free to review/comment.

In regards to the format itself. The idea would be to change the log line from:

[2018-12-19T11:10:33,609][INFO ][o.e.l.LicenseService     ] [node-0] license [1d313428-f9d3-4085-81e0-c77f032359a1] mode [basic] - valid

To:

{"type": "rolling", "timestamp": "2018-12-19T12:18:58,402", "level": "INFO ", "class": "o.e.l.LicenseService", "cluster_name": distribution_run", "node_name": "node-0", "cluster_uuid": "dlQqhmTYQ3OfoF0eLIdsCw", "node_id": "uy-oB2QFS1m73g8Gg9yAXw", "message": "license [3cbed08b-c198-4b3d-b7d6-62efb94c1c2e] mode [basic] - valid"}

Before the first ClusterStateUpdate is received, the cluster_id and node_id would not be present in log lines:

{"type": "rolling", "timestamp": "2018-12-19T12:18:57,525", "level": "INFO ", "class": "o.e.t.TransportService", "cluster_name": distribution_run", "node_name": "node-0",  "message": "publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}"}

Or can they be null? like: "cluster_uuid": null, "node_id": null

In order to support multiple output streams from docker we will have a type field that indicate what appender is the log from. For instance from deprecation log:

{"type": "deprecation_rolling","timestamp": "2018-12-19T11:24:24,949", "level": "WARN ", "class": "o.e.d.l.RestPostStartTrialLicense", "node_name": "node-0", "node_id": "MGKz9QIFTyS-IR85E5J2Fg", "nodeid2": "unknown_node_id", "message": "[POST /_xpack/license/start_trial] is deprecated! Use [POST /_license/start_trial] instead."}

I am stil not sure how to format stacktraces in the log lines. Should they also contain the fields mentioned in 1st post by Jason?

@pgomulka
Copy link
Contributor

also, should we allow users to change the pattern we create now in a JSON form?
I could imagine some users would be tempted to do this and we don't have a way to control how they change it and if they follow the json structure.

@ycombinator
Copy link
Contributor

Hi @pgomulka this is looking great!

I am stil not sure how to format stacktraces in the log lines. Should they also contain the fields mentioned in 1st post by Jason?

You mean because they are multiline? If so, what about escaping the newlines in the message field in the JSON?

@ycombinator
Copy link
Contributor

I have a request for one more logging change to support the Beats Elasticsearch module. Please let me know if I should add it as a checklist item in this issue's description OR if you'd prefer that I make a new issue OR if this comment is sufficient.

Currently we see lines like these in the Elasticsearch logs:

[2018-07-03T11:45:45,604][WARN ][o.e.m.j.JvmGcMonitorService] [srvmulpvlsk252_md] [gc][3449992] overhead, spent [1.6s] collecting in the last [1.8s]

Notice the durations in these lines, 1.6s and 1.8s. Would it be possible to normalize the unit here, perhaps to ms? That would make parsing these durations a lot easier.

@pgomulka
Copy link
Contributor

@ycombinator in regards to this duration units inconsistency, that might be a bug I think. Can you please a raise a separate issue?

@ycombinator
Copy link
Contributor

@pgomulka Done! #36896

@pgomulka
Copy link
Contributor

pgomulka commented Dec 27, 2018

@ycombinator
In regards to exception formatting. I am thinking of a format like this:

{"type": "console", "timestamp": "2018-12-27T14:45:10,386", "level": "WARN ", "class": "r.suppressed", "cluster_name": "distribution_run", "node_name": "node-0", "cluster_uuid": "XSKjGHufRR25tdwlkpCTig", "node_id": "iP2awnO9QX6ajH2z6w1byQ",  "message": "", "path": "/myindex", "params": "{index=myindex}", "exception": ["java.lang.NullPointerException: asdf",
"at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validateIndexName(MetaDataCreateIndexService.java:149) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validate(MetaDataCreateIndexService.java:596) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$IndexCreationTask.execute(MetaDataCreateIndexService.java:287) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:685) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:211) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:143) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:660) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:834) [?:?]"] }

To make a stacktrace human readable in JSON format I have enclosed each line of stacktrace in a separate array element.
I hope this should still be easy for beats to parse. Let me know what you think

@pgomulka
Copy link
Contributor

pgomulka commented Dec 31, 2018

Also I am wondering if we should add a new schema to ECS. We are after all integrating beats with ES.
We have log.yaml schema at the moment, I was thinking of having a schema with all fields we will use:

type;
timestamp;
level;
component;
cluster.name;
node.name;
cluster.uuid;
node.idd;
message;
Array exception;

@pgomulka
Copy link
Contributor

pgomulka commented Jan 4, 2019

Another problem to consider is how should we handle exceptions, messages written directly to the console. That is not a problem when running elasticsearch from a binary - log files will be parsed - but might be an issue when running from docker.

For instance if elasticsearch fails bootstrap checks we will write the exception message directly to the org.elasticsearch.cli.Terminal

 terminal.println(Terminal.Verbosity.SILENT, "ERROR: " + e.getMessage());

In Boostrap.java we remove ConsoleAppender, expecting this to print out from Terminal again later.

   // disable console logging, so user does not see the exception twice (jvm will show it already)
            final Logger rootLogger = LogManager.getRootLogger();
            final Appender maybeConsoleAppender = Loggers.findAppender(rootLogger, ConsoleAppender.class);
            if (foreground && maybeConsoleAppender != null) {
                Loggers.removeAppender(rootLogger, maybeConsoleAppender);
            }

but since docker is only using ConsoleAppender, it will miss the message in json format and will later only print out the non-json message on a Terminal

Logs from docker:
{"type": "console", "timestamp": "2019-01-04T16:23:22,319+0100", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "elasticsearch", "node.name": "Przemyslaws-MacBook-Pro.local", "message": "explicitly enforcing bootstrap checks" }
ERROR: [1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.zen.ping.unicast.hosts, discovery.zen.hosts_provider, cluster.initial_master_nodes] must be configured
{"type": "console", "timestamp": "2019-01-04T16:23:22,339+0100", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elasticsearch", "node.name": "Przemyslaws-MacBook-Pro.local", "message": "stopping ..." }

Logs from log file

{"type": "console", "timestamp": "2019-01-04T17:12:03,836+0100", "level": "ERROR", "component": "o.e.b.Bootstrap", "cluster.name": "distribution_run", "node.name": "node-0",  "message": "node validation exception\n[1] bootstrap checks failed\n[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.zen.ping.unicast.hosts, discovery.zen.hosts_provider, cluster.initial_master_nodes] must be configured"  }

Should we aim to format Terminal messages as well?

@chrisronline
Copy link
Contributor

Hey folks

I'm hoping to get some traction on this issue. We're interested in pushing forward this deprecation log in the monitoring UI initiative and we've discussed that we need cluster_uuid in the deprecation log in order to move forward, as we can't definitely know which cluster that deprecation log appeared on.

Any update here?

@pgomulka
Copy link
Contributor

@chrisronline the PR #36833 with ES changes is on track to be merged into 7.0. This will have cluster.uuid, but as discussed earlier only once the ES started up. see comment

you can see sample messages in here

@chrisronline
Copy link
Contributor

@pgomulka Thanks! That's helpful for sure!

We're hoping to get our side done before 7.0 so users can take advantage of the UI for the 7.0 upgrade. Is it possible to tackle this work in phases, where the final phase is the PR you have linked, but we can have an initial phase where we add the cluster_uuid to the deprecation log in the current format?

@pgomulka
Copy link
Contributor

@chrisronline in my view it would be better to push that all together. The majority of the work is related to propagating cluster.uuid and node.id and testing. Most of the test I wrote already assume the json format, so that would be actually more work now to isolate that change
Unless that would mean that the json format itself would be postponed?

@danielmitterdorfer @nik9000 any views on this?

@ycombinator
Copy link
Contributor

Old logs still using previous pattern and *.log filetype suffix.
New logs on the other hand would use a new pattern and *.json file type suffix.

Just wanted to point out that the audit log is not following this convention currently: plaintext logs are being sent to {clustername}_access.log and JSON logs are being sent to {clustername}_audit.log. Starting 7.0 they plan to just emit the JSON logs IIRC. /cc @albertzaharovits

@dliappis
Copy link
Contributor

raised an important problem that the readability is greatly affected with this change. We can work that around with keeping the old logs and new logs together.
Old logs still using previous pattern and *.log filetype suffix.
New logs on the other hand would use a new pattern and *.json file type suffix.
That might be concerning that there will be duplicated amount of logs (and files), but on the other side we don't log a massive amounts anyway.

Any views on this?

I am a bit skeptical about this for the following reasons:

  1. There could be customers directly scrapping/ingesting the entirety of logs/ and possibly end up with duplicate logs in their log environment.
  2. We'll use double the amount of space needed for logs. Our own logs are modest, but what about misbehaving plugins for example?

For the record, with the introduction of json logging, at least the docker use case looks really nice and easy to read esp. piping it to jq with something as simple as:

docker logs <container_name> | jq . producing very readable output IMHO like:

image

One concern though is the possible amount of frustration with new users not preparing themselves for this switch and frantically looking on how to change back to normal log output.

I chatted with @pgomulka earlier and suggested whether it makes sense to ship the log4j2.properties files that #36833 bring, clearly marked with comments and then an additional section with the normal old-style logging, fully commented out, ready to be uncommented out.

For example:

Suggestion for log4j2.properties (docker)
################################################################################
# JSON (DEFAULT) LOGGING DEFINITIONS START HERE
################################################################################

status = error

# log action execution errors for easier debugging
logger.action.name = org.elasticsearch.action
logger.action.level = debug

appender.rolling.type = Console
appender.rolling.name = rolling
appender.rolling.layout.type = ESJsonLayout
appender.rolling.layout.type_name = server

rootLogger.level = info
rootLogger.appenderRef.rolling.ref = rolling

appender.deprecation_rolling.type = Console
appender.deprecation_rolling.name = deprecation_rolling
appender.deprecation_rolling.layout.type = ESJsonLayout
appender.deprecation_rolling.layout.type_name = deprecation

logger.deprecation.name = org.elasticsearch.deprecation
logger.deprecation.level = warn
logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling
logger.deprecation.additivity = false

appender.index_search_slowlog_rolling.type = Console
appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling
appender.index_search_slowlog_rolling.layout.type = ESJsonLayout
appender.index_search_slowlog_rolling.layout.type_name = index_search_slowlog

logger.index_search_slowlog_rolling.name = index.search.slowlog
logger.index_search_slowlog_rolling.level = trace
logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling
logger.index_search_slowlog_rolling.additivity = false

appender.index_indexing_slowlog_rolling.type = Console
appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling
appender.index_indexing_slowlog_rolling.layout.type = ESJsonLayout
appender.index_indexing_slowlog_rolling.layout.type_name = index_indexing_slowlog


logger.index_indexing_slowlog.name = index.indexing.slowlog.index
logger.index_indexing_slowlog.level = trace
logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling
logger.index_indexing_slowlog.additivity = false

################################################################################
# JSON (DEFAULT) LOGGING DEFINITIONS END HERE
################################################################################

################################################################################
#
# UNCOMMENT THE FOLLOWING SECTION AND COMMENT OUT THE ABOVE SECTION TO REVERT
# TO <7.0 DEFAULT LOGGING BEHAVIOR.
#
# TEXT LOGGING DEFINITIONS START HERE
#
################################################################################
#
# status = error
#
# # log action execution errors for easier debugging
# logger.action.name = org.elasticsearch.action
# logger.action.level = debug
#
# appender.rolling.type = Console
# appender.rolling.name = rolling
# appender.rolling.layout.type = PatternLayout
# appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n
#
# rootLogger.level = info
# rootLogger.appenderRef.rolling.ref = rolling
#
# appender.deprecation_rolling.type = Console
# appender.deprecation_rolling.name = deprecation_rolling
# appender.deprecation_rolling.layout.type = PatternLayout
# appender.deprecation_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %.-10000m%n
#
# logger.deprecation.name = org.elasticsearch.deprecation
# logger.deprecation.level = warn
# logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling
# logger.deprecation.additivity = false
#
# appender.index_search_slowlog_rolling.type = Console
# appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling
# appender.index_search_slowlog_rolling.layout.type = PatternLayout
# appender.index_search_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] [%node_name]%marker %.-10000m%n
#
# logger.index_search_slowlog_rolling.name = index.search.slowlog
# logger.index_search_slowlog_rolling.level = trace
# logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling
# logger.index_search_slowlog_rolling.additivity = false
#
# appender.index_indexing_slowlog_rolling.type = Console
# appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling
# appender.index_indexing_slowlog_rolling.layout.type = PatternLayout
# appender.index_indexing_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] [%node_name]%marker %.-10000m%n
#
# logger.index_indexing_slowlog.name = index.indexing.slowlog.index
# logger.index_indexing_slowlog.level = trace
# logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling
# logger.index_indexing_slowlog.additivity = false
#
################################################################################
# TEXT LOGGING DEFINITIONS END HERE
################################################################################

WDYT?

@albertzaharovits
Copy link
Contributor

I chatted with @pgomulka earlier and suggested whether it makes sense to ship the log4j2.properties files that #36833 bring, clearly marked with comments and then an additional section with the normal old-style logging, fully commented out, ready to be uncommented out.

Un-commenting the section requires a node restart. In the case of the audit log, we went with both formats enabled out-of-the-box, with two independent loggers, and we strongly advised disabling the logger for the old format by using the cluster settings API.

Just wanted to point out that the audit log is not following this convention currently: plaintext logs are being sent to {clustername}_access.log and JSON logs are being sent to {clustername}_audit.log. Starting 7.0 they plan to just emit the JSON logs IIRC.

This is correct.

Old logs still using previous pattern and *.log filetype suffix.
New logs on the other hand would use a new pattern and *.json file type suffix.
That might be concerning that there will be duplicated amount of logs (and files), but on the other side we don't log a massive amounts anyway.

We named the new audit log {clustername}_audit.log instead of {clustername}_audit.json because the file is not really a properly formatted json document (It is not an array of comma separated json docs, it is a log of JSON docs separated by endlines).

@pgomulka
Copy link
Contributor

The json format will be a default one in the long run.
However, we will have a transition period where we will keep log files in both JSON and previous formats. This will allow us to create additional tooling and gather feedback from users in regards to readability.
Since keeping two sets of files will have an impact on storage we will reduce the log accumulation size from 2GB to 1GB
As a followup we will schedule a removal of the previous log format.

@jasontedor
Copy link
Member Author

Since keeping two sets of files will have an impact on storage we will reduce the log accumulation size from 2GB to 1GB

We can consider this; I am not convinced that it's a change that we need to make.

@dliappis
Copy link
Contributor

I think it's worth mentioning on top of @pgomulka 's notes that, going forward, logs on the console for the docker image will be in json format by default.

@ycombinator
Copy link
Contributor

If we go down the path of keeping both flavors of logs - plaintext and json - I'd like to know what globs/extensions each flavor would use. That way Filebeat can look in the right places and know how to parse what it sees.

@pgomulka
Copy link
Contributor

pgomulka commented Jan 25, 2019

@ycombinator we will produce following log files with new JSON format

${cluster_name}_server.json
${cluster_name}_deprecation.json
${cluster_name}_index_search_slowlog.json
${cluster_name}_index_indexing_slowlog.json

and following log files with an old pattern (not containing newly added IDs like cluster.uuid)

${cluster_name}.log
${cluster_name}_deprecation.log
${cluster_name}_index_search_slowlog.log
${cluster_name}_index_indexing_slowlog.log

gc.log
${cluster_name}_audit.log - I feel *.json would be ok anyway

@ycombinator
Copy link
Contributor

This is perfect, thanks @pgomulka!

@ycombinator
Copy link
Contributor

@pgomulka Looking closer at the log file names, I have a couple of questions:

  1. For the server log, it appears that today we write it to the log file ${cluster_name}.log. Just want to confirm that once this issue is resolved, we will be writing it to ${cluster_name}_server.log instead? If that's the case, then I will make sure to update the current Filebeat elasticsearch module to look for either one.

  2. What about gc and audit log files? Could you add their names to both lists in your comments so we have the complete list in one place?

Thanks!

@jasontedor
Copy link
Member Author

I do not think we should change the name of the existing logs; that is a breaking change with little benefit to us.

@pgomulka
Copy link
Contributor

pgomulka commented Jan 28, 2019

I have renamed the ${cluster_name}.log to ${cluster_name}_server.log for consistency, but agree with @jasontedor that it won't benefit us and it would be a unnecessary breaking change.
We will stick with ${cluster_name}.log for old-format logs. updated comment

I am however wondering if audit logs should not be in ${cluster_name}_audit.json instead of ${cluster_name}_audit.log. I understand @albertzaharovits concerns about this file not being a json document. I think however that this is widely used and ok to have multiple json documents in a file named *.json. Notice that we use that in our Elastic documentation as well sample tutorial from ES

GC logs stay untouched.

@ruflin
Copy link
Contributor

ruflin commented Jan 28, 2019

There is not direct benefit for Elasticsearch but I think we should change this for Filebeat to make collection easier. Right now on the Filebeat side, we define what a server log is by excluding all the other log types:

exclude_files: [".gz$","_slowlog.log$","_access.log$","_deprecation.log$"]

If a new log type is added, this config breaks which I don't think it should. Having *_server.log as pattern would be then enough and not exclude patterns are needed.

@ruflin
Copy link
Contributor

ruflin commented Jan 28, 2019

I just had a sync up with @pgomulka . Here the conclusions:

  • The *.log names will stay the same in 6.x and 7.x
  • The *.json logs will get the _server postfix
  • In 7.x by default the json and old logs will be around. So far my assumption was it is either or. This doubles the actual log size created by Elasticsearch.

@ycombinator The last point has some effects on the Filebeat module. For Filebeat 7 I think the default should be to only have the JSON logs in the path patterns. If someone wants to ingest the old logs from 7, some manual config changes for the paths are needed by the user. We need docs for this.

@pgomulka Is there a way that the Elasticsearch user can disable the "double" logging and only have JSON logs?

@pgomulka
Copy link
Contributor

@ruflin Users are free to opt out from either logs by deleting unwanted appenders from log4j2.properties. I also mentioned about that in migration docs.

We are aware of the additional log size, but during core-infra meeting we agreed that this is not a major concern. We can reduce the log size if needed by changing the rolling parameter accumulation size from 2GB to 1GB. comment

pgomulka added a commit that referenced this issue Jan 29, 2019
In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine.

To populate additional fields node.id and cluster.uuid which are not available at start time, 
a cluster state update will have to be received and the values passed to log4j pattern converter.
A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. 

Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace
see ESJsonLayout.java for more details and field descriptions

Docker log4j2 configuration is now almost the same as the one use for ES binary. 
The only difference is that docker is using console appenders, whereas ES is using file appenders.

relates: #32850
pgomulka added a commit that referenced this issue Jan 29, 2019
in order to keep json logs consistent the security audit logs are renamed from .log to .json
relates #32850
pgomulka added a commit that referenced this issue Jan 31, 2019
When extending ESIntegTestCase are run on the same jvm, the static field in
NodeAndClusterIdConverter will throw an AlreadySet exceptions.
overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening
relates #32850
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Feb 1, 2019
Since the name of the audit log file was changed it should be documented
in a migration doc.

relates elastic#32850
pgomulka added a commit that referenced this issue Feb 5, 2019
The migration documentation for an audit logging changes. Removal of plaintext logs and rename of json log file

relates #32850
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.