-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging: Make node name consistent in logger #31588
Conversation
First, some background: we have 15 different methods to get a logger in Elasticsearch but they can be broken down into three broad categories based on what information is provided when building the logger. Just a class like: ``` private static final Logger logger = ESLoggerFactory.getLogger(ActionModule.class); ``` or: ``` protected final Logger logger = Loggers.getLogger(getClass()); ``` The class and settings: ``` this.logger = Loggers.getLogger(getClass(), settings); ``` Or more information like: ``` Loggers.getLogger("index.store.deletes", settings, shardId) ``` The goal of the "class and settings" variant is to attach the node name to the logger. Because we don't always have the settings available, we often use the "just a class" variant and get loggers without node names attached. There isn't any real consistency here. Some loggers get the node name because it is convenient and some do not. This change makes the node name available to all loggers all the time. Almost. There are some caveats are testing that I'll get to. But in *production* code the node name is node available to all loggers. This means we can stop using the "class and settings" variants to fetch loggers which was the real goal here, but a pleasant side effect is that the ndoe name is now consitent on every log line and optional by editing the logging pattern. This is all powered by setting the node name statically on a logging formatter very early in initialization. Now to tests: tests can't set the node name statically because subclasses of `ESIntegTestCase` run many nodes in the same jvm, even in the same class loader. Also, lots of tests don't run with a real node so they don't *have* a node name at all. To support multiple nodes in the ame JVM tests suss out the node name from the thread name which works surprisingly well and easy to test in a nice way. For those threads that are not part of an `ESIntegTestCase` node we stick whatever useful information we can get form the thread name in the place of the node name. This allows us to keep the logger format consistent.
Pinging @elastic/es-core-infra |
I'm aware that this'll require some migration documentation but I figured I should open this to get some review first to make sure that it is the right way to go. Personally I think this PR is a net positive because it allows us to perform a ton of cleanup and makes the log format more consistent. I'm not 100% jazzed about it because of the extra complexity around test logging, but I figure that the testing for it is fairly good and that it is better to have complexity in the tests than it is in the production code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with this for a while before opening it, checking the performance of logging during tests because that is where most of the complexity is in this PR. Without the thread local caching a full ./gradlew -p server integTest
spends about 500ms across all of the child JVMs formatting the logger out of a total run time of 8 minutes. With the thread local it is less than 100ms.
# version of Java calling `Thread.getName()` doesn't allocate so UNCACHED | ||
# ought to be faster than CACHED because it doesn't have to deal with | ||
# ThreadLocals. | ||
AsyncLogger.ThreadNameStrategy = UNCACHED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: We don't use the async logger but we still use mutable log events under the hood and they are configured by this property. I'll add that in a comment when I get a chance.
@@ -87,7 +87,7 @@ public void testInvalidValue() { | |||
final boolean enforceLimits = randomBoolean(); | |||
final IllegalArgumentException e = expectThrows( | |||
IllegalArgumentException.class, | |||
() -> BootstrapChecks.check(new BootstrapContext(Settings.EMPTY, null), enforceLimits, emptyList(), "testInvalidValue")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In production code this used to be the node name but it is no longer required because the node name is already part of the logger.
@@ -126,7 +112,6 @@ private static String formatPrefix(String... prefixes) { | |||
} | |||
} | |||
if (sb.length() > 0) { | |||
sb.append(" "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was required because our logger format didn't always have a prefix. The space is now included in the new logger format.
assert parentLogger instanceof PrefixLogger; | ||
return ESLoggerFactory.getLogger(((PrefixLogger)parentLogger).prefix(), parentLogger.getName() + s); | ||
String prefix = null; | ||
if (parentLogger instanceof PrefixLogger) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it so we don't wrap every logger in PrefixLogger
because most loggers don't need it any more.
@@ -71,29 +67,19 @@ public static Logger getLogger(Class<?> clazz, Settings settings, Index index, S | |||
} | |||
|
|||
public static Logger getLogger(Class<?> clazz, Settings settings, String... prefixes) { | |||
final List<String> prefixesList = prefixesList(settings, prefixes); | |||
return Loggers.getLogger(clazz, prefixesList.toArray(new String[prefixesList.size()])); | |||
return Loggers.getLogger(clazz, prefixes); | |||
} | |||
|
|||
public static Logger getLogger(String loggerName, Settings settings, String... prefixes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These Settings
variants are no longer needed and I propose to remove them in a follow up PR.
@@ -46,6 +46,9 @@ public static Logger getLogger(String prefix, Class<?> clazz) { | |||
} | |||
|
|||
public static Logger getLogger(String prefix, Logger logger) { | |||
if (prefix == null || prefix.length() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of our loggers don't actually have a prefix so I felt like we shouldn't wrap the default logger in that case. It is one less object to allocate and one less synchronized block to execute.
* Cache of the string to output for this thread. It is a thread local | ||
* because Elasticsearch's tests log synchronously so the formatting | ||
* thread should be the same as the logging thread so the cache | ||
* *should* only miss when a thread changes its name. Which they do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to this: only test threads change their name that I know of, but I figure it is worth checking all threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! It will help cleanup (or maybe remove!) AbstractComponent. I have some general comments.
docs/java-api/index.asciidoc
Outdated
@@ -94,7 +94,7 @@ For example, you can add in your `src/main/resources` project dir a `log4j2.prop | |||
appender.console.type = Console | |||
appender.console.name = console | |||
appender.console.layout.type = PatternLayout | |||
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker%m%n | |||
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker %m%n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this example include [%node_name]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add it because I didn't think the transport client has a node name.
@@ -46,6 +46,9 @@ public static Logger getLogger(String prefix, Class<?> clazz) { | |||
} | |||
|
|||
public static Logger getLogger(String prefix, Logger logger) { | |||
if (prefix == null || prefix.length() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not fail here? These would just be noop calls that could be removed right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now everything gets routed through this method even if there isn't a prefix. How about this: I promise to remove as many of the getLogger
variants as I can in a followup but I keep this like this in this PR so I don't make the change larger than it already is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Can you at least add a comment here that this leniency should/will be removed in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
@@ -126,7 +112,6 @@ private static String formatPrefix(String... prefixes) { | |||
} | |||
} | |||
if (sb.length() > 0) { | |||
sb.append(" "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I left a comment.
* thread should be the same as the logging thread so the cache | ||
* *should* only miss when a thread changes its name. Which they do. | ||
* | ||
* Having the cache saves a couple of milliseconds off of a test run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's truly only a couple milliseconds then why do we do caching at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pretty torn on this actually. If we log enough then it saves time but logging already takes time because it has to write the logs to stdout so maybe it will just vanish in the wash. But I decided to keep it ultimately because I didn't want to make tests any slower than I had to, even if it is just a few seconds over the whole build.
I'm certainly happy to be convinced to drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lean towards reducing complexity. If it is kept, then at least please update the comment as "a couple of milliseconds" doesn't seem to match what you say here (a couple of seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll drop it and we can see if it hits build times. I don't believe it will, but we have logging for it.
/** | ||
* Load logging plugins so we can have {@code node_name} in the pattern. | ||
*/ | ||
public static void loadPlugins() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you name this something like "loadLog4jPlugins" so it is not confused with ES plugins (which it took me a few minutes to realize when reading this PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
remove?! That'd be great! |
@elasticmachine, retest this please. |
@rjernst, I've pushed some updates if you'd like to take another look. |
@elasticmachine, retest this please. |
@rjernst, could you have another look when you get a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry for the delayed review
@@ -46,6 +46,13 @@ public static Logger getLogger(String prefix, Class<?> clazz) { | |||
} | |||
|
|||
public static Logger getLogger(String prefix, Logger logger) { | |||
if (prefix == null || prefix.length() == 0) { | |||
/* | |||
* In an followup we'll throw an exception in this case directing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't explicit, I meant create a GH issue and link here.
I'm self blocking this after a conversation with a few folks. I'd like to implement this in such a way that I can backport it to 6.x so that folks backporting other things to 6.x will keep getting node names in their logs. But I can't backport the log4j.properties file change to 6.x because that is just too breaking. |
Revert the log4j2.properties changes and instead hack the changes in at configuration time. This should allow us to backport this entire change set to 6.x which should make backporting other changes simpler. We'll replace the hack in the master branch only in a followup change.
This one should stay reverted, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've modified this so that it can be safely backported to 6.x and it is ready for another review. My plan at this point is to merge this and then make a new PR against master only that reverts the log4j properties loading hacks, changes log4j2.properties, and makes all of the right fanfare for the breaking change.
I talked to @rjernst about this and he suggested adding a warning if the log used the old pattern. I will add that before another review is appropriate. |
@rjernst I re-made log4j.properties changes and added a warning to the log4j initialization hack. I'll add a message to migrate_6_4.asciidoc on backport that warns folks about the change in addition to the warning on startup. |
Wooops checkstyle! |
That last failure is probably me too. Looking. |
PatternLayout.newBuilder().withPattern("%m").build()) { | ||
PatternLayout.newBuilder() | ||
// Specify the configuration so log4j doesn't re-initialize | ||
.withConfiguration(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this the test fails with a security error because log4j tries to re-initialize itself which calls getClassloader
when there are log4j plugins which fails because of security permissions. Reusing the configuration here skips all that.
I tracked down the test failure. It was caused by the merge. I had to move where I initialized the log4j plugin to handle it but everything still works. |
@rjernst, could you have another look at this? Particularly the pattern upgrade hack in LogConfigurator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me.
* In a followup we'll throw an exception if prefix is null or empty | ||
* redirecting folks to LogManager.getLogger. | ||
* | ||
* This and more is tracker in https://github.com/elastic/elasticsearch/issues/32174 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: tracker -> tracked
|
||
/* | ||
* Subclass the properties configurator to hack the new pattern in | ||
* place of the so users don't have to change log4j2.properties in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "of the"?
* master: Logging: Make node name consistent in logger (#31588) Mute SSLTrustRestrictionsTests on JDK 11 Increase max chunk size to 256Mb for repo-azure (#32101) Docs: Fix README upgrade mention (#32313) Changed ReindexRequest to use Writeable.Reader (#32401) Mute KerberosAuthenticationIT Fix AutoIntervalDateHistogram.testReduce random failures (#32301) fix no=>not typo (#32463) Mute QueryProfilerIT#testProfileMatchesRegular() HLRC: Add delete watch action (#32337) High-level client: fix clusterAlias parsing in SearchHit (#32465) Fix calculation of orientation of polygons (#27967) [Kerberos] Add missing javadocs (#32469) [Kerberos] Remove Kerberos bootstrap checks (#32451) Make get all app privs requires "*" permission (#32460) Switch security to new style Requests (#32290) Switch security spi example to new style Requests (#32341) Painless: Add PainlessConstructor (#32447) update rollover to leverage write-alias semantics (#32216) Update Fuzzy Query docs to clarify default behavior re max_expansions (#30819) INGEST: Clean up Java8 Stream Usage (#32059) Ensure KeyStoreWrapper decryption exceptions are handled (#32464)
First, some background: we have 15 different methods to get a logger in Elasticsearch but they can be broken down into three broad categories based on what information is provided when building the logger. Just a class like: ``` private static final Logger logger = ESLoggerFactory.getLogger(ActionModule.class); ``` or: ``` protected final Logger logger = Loggers.getLogger(getClass()); ``` The class and settings: ``` this.logger = Loggers.getLogger(getClass(), settings); ``` Or more information like: ``` Loggers.getLogger("index.store.deletes", settings, shardId) ``` The goal of the "class and settings" variant is to attach the node name to the logger. Because we don't always have the settings available, we often use the "just a class" variant and get loggers without node names attached. There isn't any real consistency here. Some loggers get the node name because it is convenient and some do not. This change makes the node name available to all loggers all the time. Almost. There are some caveats are testing that I'll get to. But in *production* code the node name is node available to all loggers. This means we can stop using the "class and settings" variants to fetch loggers which was the real goal here, but a pleasant side effect is that the ndoe name is now consitent on every log line and optional by editing the logging pattern. This is all powered by setting the node name statically on a logging formatter very early in initialization. Now to tests: tests can't set the node name statically because subclasses of `ESIntegTestCase` run many nodes in the same jvm, even in the same class loader. Also, lots of tests don't run with a real node so they don't *have* a node name at all. To support multiple nodes in the same JVM tests suss out the node name from the thread name which works surprisingly well and easy to test in a nice way. For those threads that are not part of an `ESIntegTestCase` node we stick whatever useful information we can get form the thread name in the place of the node name. This allows us to keep the logger format consistent.
* 6.x: Fix scriptdocvalues tests with dates Correct minor typo in explain.asciidoc for HLRC Fix painless whitelist and warnings from backporting #31441 Build: Add elastic maven to repos used by BuildPlugin (#32549) Scripting: Conditionally use java time api in scripting (#31441) [ML] Improve error when no available field exists for rule scope (#32550) [ML] Improve error for functions with limited rule condition support (#32548) [ML] Remove multiple_bucket_spans [ML] Fix thread leak when waiting for job flush (#32196) (#32541) Painless: Clean Up PainlessField (#32525) Add @AwaitsFix for #32554 Remove broken @link in Javadoc Add AwaitsFix to failing test - see #32546 SQL: Added support for string manipulating functions with more than one parameter (#32356) [DOCS] Reloadable Secure Settings (#31713) Fix compilation error introduced by #32339 [Rollup] Remove builders from TermsGroupConfig (#32507) Use hostname instead of IP with SPNEGO test (#32514) Switch x-pack rolling restart to new style Requests (#32339) [DOCS] Small fixes in rule configuration page (#32516) Painless: Clean up PainlessMethod (#32476) SQL: Add test for handling of partial results (#32474) Docs: Add missing migration doc for logging change Build: Remove shadowing from benchmarks (#32475) Docs: Add all JDKs to CONTRIBUTING.md Logging: Make node name consistent in logger (#31588) High-level client: fix clusterAlias parsing in SearchHit (#32465) REST high-level client: parse back _ignored meta field (#32362) backport fix of reduceRandom fix (#32508) Add licensing enforcement for FIPS mode (#32437) INGEST: Clean up Java8 Stream Usage (#32059) (#32485) Improve the error message when an index is incompatible with field aliases. (#32482) Mute testFilterCacheStats Scripting: Fix painless compiler loader to know about context classes (#32385) [ML][DOCS] Fix typo applied_to => applies_to Mute SSLTrustRestrictionsTests on JDK 11 Changed ReindexRequest to use Writeable.Reader (#32401) Increase max chunk size to 256Mb for repo-azure (#32101) Mute KerberosAuthenticationIT fix no=>not typo (#32463) HLRC: Add delete watch action (#32337) Fix calculation of orientation of polygons (#27967) [Kerberos] Add missing javadocs (#32469) Fix missing JavaDoc for @throws in several places in KerberosTicketValidator. Make get all app privs requires "*" permission (#32460) Ensure KeyStoreWrapper decryption exceptions are handled (#32472) update rollover to leverage write-alias semantics (#32216) [Kerberos] Remove Kerberos bootstrap checks (#32451) Switch security to new style Requests (#32290) Switch security spi example to new style Requests (#32341) Painless: Add PainlessConstructor (#32447) Update Fuzzy Query docs to clarify default behavior re max_expansions (#30819) Remove > from Javadoc (fatal with Java 11) Tests: Fix convert error tests to use fixed value (#32415) IndicesClusterStateService should replace an init. replica with an init. primary with the same aId (#32374) auto-interval date histogram - 6.x backport (#32107) [CI] Mute DocumentSubsetReaderTests testSearch [TEST] Mute failing InternalEngineTests#testSeqNoAndCheckpoints TEST: testDocStats should always use forceMerge (#32450) TEST: Avoid deletion in FlushIT AwaitsFix IndexShardTests#testDocStats Painless: Add method type to method. (#32441) Remove reference to non-existent store type (#32418) [TEST] Mute failing FlushIT test Fix ordering of bootstrap checks in docs (#32417) Wrong discovery.type for azure in breaking changes (#32432) Mute ConvertProcessorTests failing tests TESTS: Move netty leak detection to paranoid level (#32354) (#32425) Upgrade to Lucene-7.5.0-snapshot-608f0277b0 (#32390) [Kerberos] Avoid vagrant update on precommit (#32416) TEST: Avoid triggering merges in FlushIT [DOCS] Fixes formatting of scope object in job resource Switch x-pack/plugin to new style Requests (#32327) Release requests in cors handle (#32410) Remove BouncyCastle dependency from runtime (#32402) Copy missing segment attributes in getSegmentInfo (#32396) Rest HL client: Add put license action (#32214) Docs: Correcting a typo in tophits (#32359) Build: Stop double generating buildSrc pom (#32408) Switch x-pack full restart to new style Requests (#32294) Painless: Clean Up PainlessClass Variables (#32380) [ML] Consistent pattern for strict/lenient parser names (#32399) Add Restore Snapshot High Level REST API Update update-settings.asciidoc (#31378) Introduce index store plugins (#32375) Rank-Eval: Reduce scope of an unchecked supression Make sure _forcemerge respects `max_num_segments`. (#32291)
* ccr: (24 commits) Remove _xpack from CCR APIs (elastic#32563) TEST: Avoid merges in testRecoveryWithOutOfOrderDelete Logging: Make node name consistent in logger (elastic#31588) Mute SSLTrustRestrictionsTests on JDK 11 Increase max chunk size to 256Mb for repo-azure (elastic#32101) Docs: Fix README upgrade mention (elastic#32313) Changed ReindexRequest to use Writeable.Reader (elastic#32401) Mute KerberosAuthenticationIT Fix AutoIntervalDateHistogram.testReduce random failures (elastic#32301) fix no=>not typo (elastic#32463) Mute QueryProfilerIT#testProfileMatchesRegular() HLRC: Add delete watch action (elastic#32337) High-level client: fix clusterAlias parsing in SearchHit (elastic#32465) Fix calculation of orientation of polygons (elastic#27967) [Kerberos] Add missing javadocs (elastic#32469) [Kerberos] Remove Kerberos bootstrap checks (elastic#32451) Make get all app privs requires "*" permission (elastic#32460) Switch security to new style Requests (elastic#32290) Switch security spi example to new style Requests (elastic#32341) Painless: Add PainlessConstructor (elastic#32447) ...
First, some background: we have 15 different methods to get a logger in
Elasticsearch but they can be broken down into three broad categories
based on what information is provided when building the logger.
Just a class like:
or:
The class and settings:
Or more information like:
The goal of the "class and settings" variant is to attach the node name
to the logger. Because we don't always have the settings available, we
often use the "just a class" variant and get loggers without node names
attached. There isn't any real consistency here. Some loggers get the
node name because it is convenient and some do not.
This change makes the node name available to all loggers all the time.
Almost. There are some caveats are testing that I'll get to. But in
production code the node name is node available to all loggers. This
means we can stop using the "class and settings" variants to fetch
loggers which was the real goal here, but a pleasant side effect is that
the node name is now consistent on every log line and optional by editing
the logging pattern. This is all powered by setting the node name
statically on a logging formatter very early in initialization.
Now to tests: tests can't set the node name statically because
subclasses of
ESIntegTestCase
run many nodes in the same jvm, even inthe same class loader. Also, lots of tests don't run with a real node so
they don't have a node name at all. To support multiple nodes in the
ame JVM tests suss out the node name from the thread name which works
surprisingly well and, also surprising, is fairly easy to test in a nice
way. For those threads that are not part of an
ESIntegTestCase
node westick whatever useful information we can get form the thread name in the
place of the node name. This allows us to keep the logger format consistent.
The goal of this change is to stop us from having to use the "class and
settings" variant. In subsequent changes we'll remove them entirely but I
didn't do that in this change to keep it small-ish.
Also, I have no plans to backport this to 6.x because it changes thedefault
log4j2.properties
file and many log formats. It just feels rudeto change either one of those in a minor release.
I will indeed backport this to 6.x but without the log4j2.properties changes.
I use a nasty hack to prevent us from needing them. I'll drop the hack
and make the log4j2.properties change in a follow up.