-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scoped context #2419
Scoped context #2419
Conversation
In structured logging use cases (e.g. to Google Cloud Logging), the resource attributes should go to the "context" field. With this approach, the attributes will go to the "message" field. Therefore, it won't be possible to easily find all the log messages when the same attributes are used with I think we need an approach that will allow sending resource attributes to "context" so the two approaches are consistent. |
@rocketraman I did look into that and what you are asking for just can't be implemented in a reasonable way without significant mods to the API and will likely impact performance. The problem is that Loggers are singletons and the code where the LogEvent is constructed and the ContextMap is populated is pretty far down the stack. We defer creating LogEvents to avoid the performance overhead when they aren't going to be used. Messages, on the other hand, are constructed very early as they are fairly light-weight. So doing that in a per-instance Logger that wraps the singleton was fairly easy to do. You will need to point me to some docs on Google Cloud Logging, however at my employer we are running a cloud infrastructure and log to an ELK stack and making this work the way you would like would be fairly easy, although it might be nice to enhance JsonTemplateLayout if it makes sense. To give you an idea, in my infrastructure we have JsonTemplateLayout configured iwth
Note that with this configuration the "message" key will look exactly like it would in a log file. However all the context fields, except for the user's token, are included as distinct fields while all the message map items are under the event.data tag. However, these could have also been flattened into distinct fields as the context data is and you wouldn't know the difference when searching. If you have other requirements for how the data should be formatted I am sure JsonTemplateLayout could be enhanced to support it, if it can't do it already. One other thing. I explicitly implemented ParameterizedMapMessage because I have known for a long time that MapMessage is insufficient as it doesn't treat the message being logged separately from the Map. ParameterizedMapMessage does. When getFormattedMessage is called ONLY the data the user logged is included as it it was a ParameterizedMessage. However, the map is still available to all the Lookups, Filters, and Layouts that support MapMessages, so that works correctly as well. |
The fact that there is only one logger per logger name is not set in stone (unlike in Logback). Due to the architectural choices you did at the beginning,
IMHO, the
Another thing that I like about your approach is that is solves #1813: in SLF4J 2.x there is a list of key/value pairs that is distinct from the context map. Currently I am mixing the two sets in
Is there really such a difference between
but an implementation could very well choose to use something like |
I understand that if one has end-to-end control over layout, we can use the approach you've created here. However, we have a meaningful and clear separation between logging and layout. This means it is absolutely normal for developers to configure application logging, while some completely different of people (enterprise dev tooling group or perhaps upstream library such as Spring Boot) determines layout, and in many cases developers have no ability to affect layout at all — all applications across the enterprise use the same layout in order to enforce logging consistency. My references to Google Cloud Logging are for just such a case at an enterprise client of mine in which the logs in their Google Cloud have a If we want to blur the separation between context and message, then lets go all the way. Put everything into message, and then come up with an appropriate solution in order to distinguish different types of fields for layout so we can maintain a good separation between logging and layout. Make it a breaking change for Log4j3.
I'm not as familiar with Log4j internals as you and @ppkarwasz . However, from the perspective of a naiive user, I'm not sure what is so difficult? As a user I could wrap my logger in my own class that, for every method, did (in pseudo-code):
and (I believe) problem solved. However, this should be unnecessary for a user to do. |
While it is a multi-key map you will get warnings if you try to create the same logger with multiple message factories as usually that is going to cause problems. However, I don't see how that helps with what rocketraman is requesting. It is worth repeating that resource Loggers should NEVER be added to the LoggerRegistry. Resource Loggers need to have the same lifetime as the resource. Loggers in the registry never go away and so would result in a memory leak. Note that them not being in the registry isn't really a problem as the underlying Logger they use is. That Logger will be reconfigured when needed.
Sure, you could do this, but I am not sure it is a good idea. There are other uses of MapMessage where you do not want the data merged into the context data. I would be hesitant to always merge it without some indication from the user since there are a lot of cases where that is not what you want.
Did you just answer your own question? Of course there is a difference. A Message is really just a container for the format String and its parameters. The fact that you can make more complex messages like a MapMessage is a perk. But the main difference is that the user really controls the contents of the Message while Log4j controls the LogEvent. Now, we could enhance the transformer to replace logging calls that take parameters with Logging calls that accept LogEvents but a) I believe it would be tough to make that garbage free and b) I am not sure we could always get it right since a lot is determined at run time. The bottom line here is Piotr, I am not sure what to do with your comment. You haven't actually asked for any changes.
I absolutely understand this. All the Spring Boot and Spring Cloud Config support in Log4j was added by me to support my employer. We have a common logging configuration stored in Spring Cloud Config that all applications share. However, every application can still provide an override that extends capabilities and we take advantage of that. Now this is sort of interesting
Yes, one could modify ResourceLogger to have
However, this has a small side effect that any logging calls that occur from objects passed into the logging call or logging that occurs from Appenders and other components while processing the logevent will also have this context data, despite them using a different logger. If that makes sense I would actually consider this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the ResourceLogger
proposal from this PR, but I would drop the ScopedContext
part for a simple reason: there are other specialized APIs (OpenTelemetry and context-propagation
) to deal with context propagation that are much more advanced. If we want to propose a light-weight alternative for them, we should do it in another artifact (e.g. log4j-context
), but I am not sure if we have to propose any radically new approach to the problem.
Regarding ResourceLoader
, Log4j Core 3.x doesn't reuse neither the AbstractLogger
utility class from 2.x nor its message factories (it is Recycler
-based). Could we implement an interface-based solution so that we can have differences between implementations.
/** | ||
* Context that can be used for data to be logged in a block of code. | ||
* | ||
* While this is influenced by ScopedValues from Java 21 it does not share the same API. While it can perform a | ||
* similar function as a set of ScopedValues it is really meant to allow a block of code to include a set of keys and | ||
* values in all the log events within that block. The underlying implementation must provide support for | ||
* logging the ScopedContext for that to happen. | ||
* | ||
* The ScopedContext will not be bound to the current thread until either a run or call method is invoked. The | ||
* contexts are nested so creating and running or calling via a second ScopedContext will result in the first | ||
* ScopedContext being hidden until the call is returned. Thus the values from the first ScopedContext need to | ||
* be added to the second to be included. | ||
* | ||
* The ScopedContext can be passed to child threads by including the ExecutorService to be used to manage the | ||
* run or call methods. The caller should interact with the ExecutorService as if they were submitting their | ||
* run or call methods directly to it. The ScopedContext performs no error handling other than to ensure the | ||
* ThreadContext and ScopedContext are cleaned up from the executed Thread. | ||
* | ||
* @since 2.24.0 | ||
*/ | ||
public class ScopedContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[This part seems independent from the ResourceLogger
]
While technically this class is very well written and could benefit some users, I am more and more convinced that we shouldn't offer this kind of context-propagation API.
ThreadLocal
was a big hit and allowed users to store and retrieve key/value pairs for logging purposes (and not only 😉), but nowadays there are APIs specialized in this kind of things. Except the context-propagation
project I cited on the mailing-list, there is an entire Context Propagation service from OpenTelemetry.
If we were to offer users ScopedContext
, I am afraid that sooner or later they would need to refactor it to use another API instead. The OTel Context
class offers many more wrappers than this class.
In my opinion, what we should do instead is to help application and libraries that instrumented their code with ThreadContext
to migrate to a specialized context-propagation API. For example we could submit to [open-telemetry/opentelemetry-instrumentation-java
] a ThreadContextMap
implementation that uses the Bagage API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ppkarwasz I have no plans to merge this PR without ScopedContext. I am quite sure I will use it in my environment far more frequently than I will ever use ResourceLogger. I can easily envision Service 1 creating a ScopedContext that gets propogated to Service 2, which then adds its own ScopedContext data. This would be extremely valuable in the logs.
As you know, I am also against the API having dependencies on anything but the JDK. I am not inclined to have the API be dependent on someone elses context propagation API. Looking at the Context object I see that it mostly adds different kinds of functions that can be called. I am not convinced that is necessary but if it is they can always be added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Ralph here. We're at the bottom of the stack. While we can bridge our APIs over to other libraries, those don't belong in our API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know, I am also against the API having dependencies on anything but the JDK. I am not inclined to have the API be dependent on someone elses context propagation API.
I am not suggesting to add an external context propagation API dependency to log4j-api
, but to delegate context propagation to higher levels of the stack. For example let's look at the first example of "Observability with Spring Boot 3":
// Create an Observation and observe your code!
Observation.createNotStarted("user.name", registry)
.contextualName("getting-user-name")
.lowCardinalityKeyValue("userType", "userType1") // let's assume that you can have 3 user types
.highCardinalityKeyValue("userId", "1234") // let's assume that this is an arbitrary number
.observe(() -> log.info("Hello")); // this is a shortcut for starting an observation, opening a scope,
The userId
value will be present in the spans and metrics, but will it be present in the context data of the log event? If it is not present, how can I add it? I don't think that wrapping the lambda using ScopedContext
is a practical solution.
Looking at the
Context
object I see that it mostly adds different kinds of functions that can be called. I am not convinced that is necessary but if it is they can always be added later.
The interesting part for me is not the amount of wrapper methods it provides, but the promise that it will propagate everywhere using any kind of technology (Netty, HTTP, Spring Reactor, …). They already integrate with Log4j Core (see log4j-context-data-2.17
), but even if they didn't, we could create a Maven module that integrates OpenTelemetry with Log4j API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ppkarwasz
Regarding your Observation object, just please no. This is way, way too complicated and it is serving a completely different concept than just tracking keyed items you want included in the logs (or possibly quick access to in your application). The ScopedContext can be propogated anywhere as well. We have a utility class at work that propagates the ThreadContext via RestTemplate, Kafka, and Amqp. Making it support ScopedContext is easy as it just needs to do what ScopedContextDataProvider does to get hold of the context entries. Actually, a version for REST is in Log4j-Audit. I don't believe Log4j should provide the integrations to these technologies at least as part of core or the api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe Log4j should provide the integrations to these technologies at least as part of core or the api.
On this we agree, but do we need to force those projects to write yet another integration?
The ScopedContext can be propogated anywhere as well. We have a utility class at work that propagates the ThreadContext via RestTemplate, Kafka, and Amqp. Making it support ScopedContext is easy as it just needs to do what ScopedContextDataProvider does to get hold of the context entries.
Every context propagation is easy, but it is annoying. At my previous job we had to propagate Spring's request scope, our own scope, Vaadin's context and so on. I started resenting colleagues that used static context accessors, since their code was usually tested on the servlet working thread and we only noticed failures (NPEs), when accessed from an asynchronous thread.
public String getFormattedMessage() { | ||
return baseMessage.getFormattedMessage(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, Open Telemetry uses MapMessage#getFormat()
and MapMessage#get("message")
to extract the body of a map message (cf. LogEventMapper
). Therefore I would override the getFormat()
to return baseMessage.getFormattedMessage()
.
Regarding how getFormattedMessage
should work, I am not sure what a user would expect. My guess is that it should be something like:
key1="value" key2="value" Formatted base message
i.e. the "FULL" message. Note that many (even third-party) layouts know about MultiformatMessage
, so they don't call this method.
Remark: could we create an interface (e.g. StructuredMessage
, AttributeMessage
or something similar) that codifies how the current sub-classes of MapMessage
work? E.g.:
public interface AttributeMessage<V> extends Message {
@Override
default String getFormat() {
Message message = getMessage();
return message != null ? message.getFormattedMessage() : "";
}
@Override
String getFormattedMessage();
@Nullable Message getMessage();
Map<String, V> getData();
}
Also it would be nice to add some well-known constants for MultiformatMessage
. Integrators do use strings like JSON
in practice, but the specification does not prevent the creation of message implementations that return some garbage instead of JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ppkarwasz
I would suggest that Open Telemetry would be better served by adopting ParameterizedMapMessage than treating the message key as special. I admit, I have done the same thing in the past which is what motivated me to create this Message class as I have always found that irritating. You are correct the getFormat() should be overridden but it should be baseMessage.getFormat().
The purpose of ParameterizedMapMessage, as I have stated several times, is to have it format %m EXACTLY how ParameterizedMessage would but still have it be a MapMessage so other Lookups, Filters, and Layouts can extract the structured data from the message. That is it. No fancy tricks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that Open Telemetry would be better served by adopting
ParameterizedMapMessage
than treating the message key as special.
Due to a "happens before" relationship they didn't have the chance to do it. 😁
The purpose of ParameterizedMapMessage, as I have stated several times, is to have it format %m EXACTLY how ParameterizedMessage would but still have it be a MapMessage so other Lookups, Filters, and Layouts can extract the structured data from the message. That is it. No fancy tricks.
Sure, extending MapMessage
is a nice trick to maintain some sort of backward compatibility. However I would like to have a well documented Java interface that ParameterizedMapMessage
will implement. In time we can switch all the instanceof MapMessage
to that class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to a "happens before" relationship they didn't have the chance to do it. 😁
They will be able to after this is merged.
However I would like to have a well documented Java interface that ParameterizedMapMessage will implement. In time we can switch all the instanceof MapMessage to that class
Umm. No. There are multiple reasons to use a MapMessage of various kinds. Consider StructuredDataMessage. That supports RFC5414 which specifies how things are supposed to behave. I use MapMessages to create data for dashboards in Kibana. In that case you don't want or need the ParameterizedMessage support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getFormattedMessage
can always return the empty string, if there is not message attached to the map.
log4j-api/src/main/java/org/apache/logging/log4j/ResourceLogger.java
Outdated
Show resolved
Hide resolved
/** | ||
* Constructs a ResourceLogger. | ||
*/ | ||
public static final class ResourceLoggerBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would prefer to just have a ResourceLoggerBuilder
(or LoggerBuilder
) interface and access it through a new Logger#newDetachedLoggerBuilder
method.
Creating "detached" loggers might be used for other purposes than adding key/value pairs to a message. For example someone might want to use a different MessageFactory
.
Good point, so maybe
Sure, the default event factory should not merge the data, but we can propose an alternative one that does merge it. The crucial point for this to work is that
Sure, there is a semantic difference, but a
I needed some time to digest your proposed changes, 😉. While it doesn't touch a lot files, the amount of API changes in this PR has not been seen since the introduction of |
@rocketraman I have modified the PR so that ResourceLogger uses ScopedContext and behaves the way you requested. As things stand, I am quite happy where this is and personally would start using it (well ScopedContext anyway) right away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea in general so far. Left some comments.
public class ResourceLoggerTest { | ||
@BeforeAll | ||
public static void beforeAll() { | ||
System.setProperty("log4j2.loggerContextFactory", TestLoggerContextFactory.class.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: good idea to clear this property afterwards.
* @param executorService the ExecutorService to dispatch the work. | ||
* @param op the Callable to call. | ||
*/ | ||
public static <R> Future<R> callWhere(String key, Object obj, ExecutorService executorService, Callable<R> op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got Java 8 at minimum at our fingertips. Let's make sure to support CompletableFuture
as well. If possible, it'd be great to support Flow.Publisher
from Java 9, too, though context propagation with reactive streams tends to use a non-standard API such as Reactor's context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvz Isn't CompletableFuture a Future? How is it not supported now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It implements Future, yes, but it's also a CompletionStage which is the part of the API I'm more interested in.
log4j-api/src/main/java/org/apache/logging/log4j/internal/ScopedContextAnchor.java
Show resolved
Hide resolved
log4j-api/src/main/java/org/apache/logging/log4j/message/ParameterizedMapMessage.java
Show resolved
Hide resolved
Some of my messages are part of a "review", other are simple comments. |
@ppkarwasz |
Yes, in general I am +10 on
In particular:
|
@ppkarwasz
Yes. IMO CloseableThreadContext is just as bad as what SLF4J did. It requires the application to "get it right"all the time. It is just as impossible to leave a ScopedContext without leaving the context as it was when it was entered as it is to leave a synchronizd block holding a lock. Trying to merge those two would result in a broken mess.
That isn't a bug. The ContextDataInjector injects data into LogEvents. $ctx will work in that case if it is operating on a LogEvent. If there is no log event then $ctx operates on the ThreadContext. Nowhere do we document that a ContextDataProvider adds data to the ThreadContext.
Why? Then I am forced to create a Logger to create a ResourceLogger. I can't imagine why a user would really want both. If you want to log stuff happening within the scope of a Class then everything should use the one Logger. This just seems ugly.
I simply don't udnerstand why this has anything to do with log4j-slf4j. We aren't here to make all Log4j-API functionality work in SLF4J.
Yeah - this is a minor nit I noticed as well. It would have gotten fixed along wtih the System property removal Matt mentioned.
Sure, I suppose. I'm ambivalent to this one. It would mean changing a LOT of code that has behavior tied to MapMessages in not so subltle ways but I suppose it could be done. I wouldn't want it as part of this PR as it would be too big. After the change to use the ScopedContext ParameterizedMapMessage wasn't necessary anyway. |
I would be OK with an implementation based on I would also be OK with a |
@ppkarwasz Also - when you say "based on ThreadContextMap"I would have to respond with "Which one"? It looks to me that there are something like 8 different implementations in Log4j now. (This is quite out of hand IMO). A log4j-context artifact in a separate repo or a submodule? This is pretty tiny on its own. It is a grand total of 3 classes - ScopedContext, ScopedContextAnchor, and ScopedContextDataProvider. Note that ScopedContext and ScopedContextAnchor don't require log4j-core while the ContextDataProvider does. I am not sure if that the JPMS module dependency could be optional so that the SLF4J bridge could eventually do something with it. |
In your edited comment you make a lot of good points. Let us discuss in a video call after the holidays. |
I think that we can reuse While I totally agree that exposing the functionality of that interface through Sure, implementing /**
* Saves the current context map data.
* <p>
* The returned object is not affected by later changes to the context map.
* </p>
* @return An opaque thread-safe version of the internal context map representation.
* @see #restoreContextMap
* @since 2.24
*/
static Object saveContextMap();
/**
* Restores the context map data, from a saved version.
* @param contextMap An opaque thread-safe version of the internal context map representation.
* @see #saveContextMap
* @since 2.24
*/
static void restoreContextMap(Object contextMap); These methods might be useful in |
@ppkarwasz
which isn't particularly helpful.
|
I would love that, but after looking at our SLF4J integrations I am not sure any more we can do that.
I am reticent about adding a new thread local, because propagating the As far as I remember JBoss LogManager also has an MDC that can have object values. In this case reusing
By "taking a snapshot" I mean retrieving the |
OK. I think I understand your concerns WRT tooling and I will do what I can to address that. FWIW, I don't believe using the ThreadContext's ThreadLocal would help in that at all. As for the snampshot, again I don't have any idea why you would want the whole stack of Maps for a ScopedContext. Usually you would only want the current map. |
I have updated the branch with some significant changes:
|
Rats. I can't reopen the PR. Guess I have to create a new one again. |
This adds support for a ScopedContext and a ResourceLogger as requested in apache/logging-log4j-kotlin#71 and similar to #2214
Git has a funny way of mangling old PR branches so I deleted the old one and recreated it. Unfortunately, that causes comments on the previous PR to be lost here. This has several changes to ScopedContext from the previous PR.
Example usage for wrapping a simple code block.
Creating Threads:
Note that this syntax supports virtual threads.
ResourceLogger can be used to include resource data in all log events coming from a specific Logger.
All events in a ResourceLogger use ParameterizedMapMessage. When logging it behaves like a ParameterizedMessage in that formatMessage only returns the message directly being logged. However, it is a MapMessage so all the attributes can be accessed via the normal Layout, Lookup, and Filter capabilities we have added.