-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An RFC for structured logging #296
Conversation
As for serde, I want to raise an issue of serde not being capable of serializing empty arrays correctly, i e, the type information gets lost. As this issue was closed as "wontfix", this made serde impossible to use for some formats (such as mine). Maybe that limitation is not severe enough for this case - that's up to the rest of you to decide - but I think it's something to be aware of when choosing a serialization crate. |
The only thing that worries me is the inability to express data that can be sent to another thread without serialization. One idea that I have is putting additional bound on every value: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html This way values are always taken as references, but a async-logger implementation can just call |
@diwic while I typically love using It seems to me, having serde dependency behind a feature flag, and thus the ability to support other solutions (or serde versions) is a good thing to have. The pain point with libraries like logging is that they are interoperability point of the whole ecosystem, and having to introduce any breaking changes (like public dependency) can be a painful experience for everyone. |
I'm not convinced that a serde dependency is the right choice for this use case. The log crate needs to focus on being lightweight in terms of code generation in particular. I think a virtual-dispatch based approach may work out better there. |
For some prior art, over in Haskell-land I've been using katip. Not perfect (perhaps particularly on the serialization side?), but it's worked better than previous libraries I'd used that attempted to provide single input / multi output structured logging w/ support for ordinary text logs. A faculty for flipping between nginx-style / apache compatible / etc. text logging would be neat too. |
I haven't wrapped my head around all the details yet, but requiring serde as a depending feels kind of heavy handed to me (especially since, as far as I can tell, there is no way for the top-level crate to disable the serde integration). I would personally feel better about this if there was another feature, only for use in the top-level crate, that would act as a "kill switch", removing the dependency on serde. (I'm not sure if that is feasible. For the log-recording part it should be, I think; for the sinks I'm not so sure). |
It sounds like we're not sold on depending on So maybe we can find an API that doesn't bake in |
I think that adding a dependency to include a trait is not a big issue. The problem is that in the case of If instead of requiring full |
This might just be my opinion after being immersed in it for so long, but I can't think of any case where non-structured logging is preferable to structured logging. If it's available we should be using it.
|
So, the easiest way is just to say that keys are Because the moment we cross that line and say that we want something else than strings in the log backend, may it be integers, arrays, maps, booleans or something else, the question becomes much more complex because a) we then have to decide what we want to support and what we want to not support and b) all log backends need to implement support for this as well. |
One question that isn't clearly answered from reading the RFC: |
Is it possible/desirable to have a context argument? The examples show key-value pairs being passed explicitly, however I could imagine wanting to prepare a "map" of key-value pairs ahead of time (a context) and passing that to multiple logs. |
Your logging backend could implement it and be initialized with desired metadata, that would add to key-value pairs of each If you want explicit context, that you could manipulate and pass at runtime, like in |
I like the sentiment here, but I don't think we can really start with
Sure can! While preparing this RFC I put together an an experimental repo for this kind of contextual logging.
This is something we do in .NET using message templates. I think an alternative implementation of the |
My sticking points on the serialization question are:
We possibly can find a way to achieve that without baking in |
@sfackler this is using virtual dispatch by way of The lightweight code generation requirement is an interesting one I haven't really thought about at all. Could you elaborate on it a bit more? Which usergroup is it important to? |
I believe the author of serde may also have some ideas for a lighter weight serde alternative |
Looking at the complexity concern with |
Strings are indeed all we get with such a proposal, which makes writing a backend to "log" a lot easier, than if it has to support all data types that someone somewhere has decided. If there are technical difficulties with starting with "anything Display" as values, how about starting with only |
This seems like a proposal for a white horse with black stripes that we are told ought not be considered a zebra. I hope that I'm wrong but signs are showing that the stdlibbers are hungry to grow stdlib beyond its original mandate. I'm not an advocate of stdlib crates competing with those of the rest of the ecoystem. stdlib Rust should foster cooperation, not competition. So, what would I need slog for if its main features were made available in a stdlib log crate? This is a question I and others will have if core functional overlaps exist. I'm very interested in seeing a proposal for structured logging that is novel and does not overlap. Why would the authors of slog want to refactor working code just to make it line with stdlib team designs? Why should they? |
I totally agree with you there. Having to write a fn write_pretty(w: impl Write, r: &Record) -> io::Result<()> {
// Write the first line of the log record
...
// Write each key-value pair using the `WriteKeyValues` visitor
record
.key_values()
.try_for_each(|k, v| writeln!(&mut w, "{}: {:?}", k, v))
.map_err(|e| io::Error::new(io::ErrorKind::Other, e.into_error()))
} For properly structured formats, where a fn write_json(w: impl Write, r: &Record) -> io::Result<()> {
let r = SerializeRecord {
lvl: r.level(),
ts: epoch_millis(),
msg: r.args().to_string(),
props: r.key_values().serialize_as_map(),
};
serde_json::to_writer(w, &r)
}
#[derive(Serialize)]
struct SerializeRecord<KVS> {
lvl: Level,
ts: u64,
msg: String,
#[serde(flatten)]
props: KVS,
} So I think the I have also been hacking around with |
Logging is an interoperability problem. Almost every crate does some logging, and if parts of ecosystem use different logging, then in practice it will degrade to lowest common denominator. Also, I don't know about others but I don't want to be maintaining libraries if I could get the same functionality from (semi-)stdlib.
Most people just take a one that was written and doesn't care. I have used Even less so with logging, where people typically just want to dump a bunch of json logs to ElasticSearch. For |
slog already does something like this, I think without proc macros; named format arguments are interpreted as structured metadata in addition to being formatted, while anonymous ones aren't. It's nice to use. |
Well,
This is true, there's a lot more value to get out of structured logging when you use a log store that supports working with structured data, like Elasticsearch. That's why I don't think we should punt on the actual structured serialization support for this RFC.
Ah gotcha. We could do the same thing in info!("the format string with a {number}"; number = 2); |
I've spent some time going around in circles on The trade-off there that I see is that we'd then need another trait like |
Ok, I spent some time exploring the serialization API more to remove the required This API doesn't require I think this approach can mitigate some of the concerns about |
Ok, what do we think of this API, based on the serialization proof-of-concept I posted before? I've stuck with the visit terminology over the serialize terminology because actually serializing a key-value pair is just one thing you might want to do with them. Other things include checking whether a specific pair is present and doing something with its primitive type, like comparing the timestamp on a record to one that was previously set in some ambient context to calculate the elapsed time. I changed // --- Set of key-value pairs --- //
pub trait Source {
fn visit<'kvs>(&'kvs self, visitor: &mut dyn SourceVisitor<'kvs>) -> Result<(), Error>;
// --- Provided adapter methods --- //
// Using these you probably don't to build a `SourceVisitor` yourself
fn erase(&self) -> ErasedSource
where
Self: Sized,
{}
fn try_for_each<F, E>(self, f: F) -> Result<(), Error>
where
Self: Sized,
F: FnMut(Key, Value) -> Result<(), E>,
E: Into<Error>,
{}
fn get<'kvs, Q>(&'kvs self, key: Q) -> Option<Value<'kvs>>
where
Q: Borrow<str>,
{}
#[cfg(feature = "serde")]
fn serialize_as_map(self) -> SerializeAsMap<Self>
where
Self: Sized,
{}
...
}
pub trait SourceVisitor<'kvs> {
fn visit_pair(&self, key: Key<'kvs>, value: Value<'kvs>) -> Result<(), Error>;
}
// --- Individual key or value --- //
// Can't be implemented manually
// It's either a fixed set of `std` types or `T: Serialize`
pub trait Visit: private::Sealed {
fn visit(&self, visitor: &mut dyn Visitor) -> Result<(), Error>;
}
#[cfg(not(feature = "serde"))]
impl Visit for $std_ty {} // where $std_ty is `u8`, `bool`, `&str`, `&Path` etc
#[cfg(feature = "serde")]
impl<T: Serialize + ?Sized> Visit for T {} // covers all $std_ty impls + a bunch more
pub trait Visitor {
fn visit(&mut self, v: &dyn Visit) -> Result<(), Error> {
v.visit(self)
}
fn visit_i64(&mut self, v: i64) -> Result<(), Error>;
fn visit_u64(&mut self, v: u64) -> Result<(), Error>;
fn visit_f64(&mut self, v: f64) -> Result<(), Error>;
fn visit_bool(&mut self, v: bool) -> Result<(), Error>;
fn visit_char(&mut self, v: char) -> Result<(), Error>;
fn visit_str(&mut self, v: &str) -> Result<(), Error>;
fn visit_bytes(&mut self, v: &[u8]) -> Result<(), Error>;
// If a format is given a complex `serde` datastructure, like a map or sequence
// A format might want to write a placeholder value like `null` instead
fn unsupported(&mut self) -> Result<(), Error> {
Err(Error::msg("unsupported value"))
}
}
// --- Concrete types for keys and values --- //
// `ToOwned::Owned: Send + Sync + 'static`?
// use `Borrow<str>` as the generic trait bound for keys
pub struct Key<'k> {}
impl<'k> Key<'k> {
fn as_str(&self) -> &str {}
}
// `Key` is always `Visit`
// If `serde` is available, it's also `Serialize`
#[cfg(not(feature = "serde"))]
impl<'k> Visit for Key<'k> {}
#[cfg(feature = "serde")]
impl<'k> Serialize for Key<'k> {}
// `ToOwned::Owned: Send + Sync + 'static`?
// use `Visit` as the generic trait bound for values
pub struct Value<'v> {}
// `Value` is always `Visit`
// If `serde` is available, it's also `Serialize`
#[cfg(not(feature = "serde"))]
impl<'v> Visit for Value<'v> {}
#[cfg(feature = "serde")]
impl<'v> Serialize for Value<'v> {} the Why
|
A possible future extension we could make to impl<'v> Value<'v> {
fn from_error(err: &'v impl std::error::Error) -> Self { .. }
fn as_error(&self) -> Option<&dyn std::error::Error> { .. }
} with whatever bounds it is we have to satisfy to make the returned error from |
So we touched on this a bit at the all hands, and while I don't think any major red flags were raised, it is a vast piece of API to consider. At least we can land it incrementally. One thing we didn't get a chance to look at is the macro syntax. Unlike the traits and structures we probably can't land a subset of the macro. Right now, the RFC macro syntax is:
It's a shame to have to repeat format args in the structured list if you don't just want them interpolated into the message, but I couldn't think of a backwards-compatible way to do it. But, maybe we could use a token in the macro to opt-in to treating format args as also structured:
So the leading Does anybody have any other thoughts on their preferred macro syntax? |
cc @sfackler |
Using some kind of switch within the macro is a strategy that would also work for treating the output of #317 as an optionally structured value too. |
Macro syntax: See alternative considered for #317, using '=' as marker token. At least in that case it seemed to me, like more to teach with DSL, than is the case with dedicated function-like macros. That said, the -v macros are more unique in that they are restricted to a single value. Have or might you consider that as an alternative? Like a parallel set of -s or other suffix macros where all named parameters are consider structured and available for the message format as well? Also, for many cases where this would actually be used, would it be adequate and/or potentially more convenient to just configure something with the logging implementation to format specific structured values in the output, either before or after the passed message? (I've held off on commenting here, trying to catch up on its evolution and also understand the tokio stuff, but...) Have or might you also consider adding the |
The marker I was thinking of was something like
I think we could end up with a macro-explosion going the separate suite of macros route here, so something that doesn't require new macros for interpreting structured data differently would be better I think.
That's an interesting idea, and you could support it in a logging framework built on top of
Thanks! I haven't looked at this crate before so will check it out. |
I for one would much rather have separate macros that have good syntax than a single unified macro with awkward syntax. Couldn't they even have the same names, just residing in a different module and requiring manual import? |
@Ralith do you find the There's a whole new world of improvements we could make to the I think the problem with shipping separate macros that require manual imports is a lack of discoverability, and still working with consumers that want to |
@KodrAus I tend to agree with @Ralith and I do think If that is what results in the final design there are decent odds I'd end up making a wrapper crate with different syntax and I'd rather not proliferate a logging API idiolect if I can help it. Thank you very much for your efforts towards structured logging! It's a faculty near and dear to me and you've made tremendous progress in a short period of time. |
Ok, I think it's probably better not to ship macro support that users don't like, since we can work on it externally, and frameworks like |
Alrighty, I've updated the RFC text to punt on the macro syntax. I'd really like to be able to spend some time exploring different macro designs that are structured off-the-bat. There's still tangible benefits to shipping a structured logging API in What do you all think? |
I meant to contrast it to taking structure from named format arguments directly, as in the
That seems like a reasonable idea. |
Alrighty, this has been running for a quite a while now, and thanks to everyone that's been involved so far it seems like we've worked through the major parts of the design space. cc @rust-lang-nursery/libs and the community: shall we kick off a final comment period for this one? If you'd like to register a concern with the RFC please leave a comment here! If we haven't received any by Friday, 8th of March then I'll merge this one and work towards PRing the initial minimal API proposed in the text. I appreciate that this is an enormous document, and the implications of various design decisions are pretty well buried at this point. So any PRs that introduce structured logging support will be small and incremental, so we have an opportunity to discuss them just-in-time too. I personally consider this RFC more of a guide than a commitment. |
I wanted to add in a little plug for something that the |
@neoeinstein Sure! That would be a feature of the EDIT: Just to add, one of the motivations for the |
Merged! Thanks everybody that's given their time and input so far ❤️ Now we'll start building up the implementation incrementally in We'll tag these PRs with a GitHub label so you can find them, but |
@KodrAus I realise I'm very late to the party, but would the API proposed by this RFC supported logging nested values? The use case I'm thinking of is when I have a |
Hi @atroche 👋 Ah yes that scenario is actually supported through external serialization APIs. So, for example, the current unstable implementation supports a framework called use log::kv::Value;
// Assuming `Header` implements `sval::Value`
let map: HashMap<String, Header> = request.headers();
// `Value` implements `sval::Value`, so we can visit complex structure
let value = Value::from_sval(&map); // If `Header` implements `serde::Serialize` instead then we can still use it
let map = sval::serde::to_serialize(map);
let value = Value::from_sval(&map); We haven't spent much time on how we might surface that through the macros just yet though. |
Rendered
Reference implementation
Sample API docs
Hi! So we've been working through some designs in #149 on how structured logging could be supported in
log
. I've since gone away and tried to capture the results of that discussion and my experiments over the year into a concrete proposal that we can refine further. It looks pretty long, but a fair chunk of that is blocks of source.I don't necessarily expect us to merge this PR even once we're happy with it, it's really just an effort to get more directed feedback on a significant API using a process we're familiar with in the Rust community.
cc @sfackler @dpc (who I know has limited capacity) @Thomasdezeeuw
Open questions: