Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

journald - Consider parsing more known fields from logs #7298

Open
djaglowski opened this issue Jan 20, 2022 · 28 comments
Open

journald - Consider parsing more known fields from logs #7298

djaglowski opened this issue Jan 20, 2022 · 28 comments
Labels
never stale Issues marked with this label will be never staled and automatically removed receiver/journald

Comments

@djaglowski
Copy link
Member

...

I'm looking at the journald input, which currently dumps everything into the body. The only thing it pulls out is the timestamp.

Ideally (IMHO), the body would be a string containing MESSAGE and all the other fields stored as attributes. Additionally, the priority would be converted to an OpenTelemetry severity. Given journald has a well defined list of fields [1], many of them could be converted to semantic conventions such as process.pid.

So, the question is whether things like the journald input should convert messages to a more native opentelemetry format? If not, should there be an option, operator or processor that could do it all in one go?

(I can create a PR to change the journald input if there is agreement)

  1. https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html

Originally posted by @gregoryfranklin in #3129 (comment)

@dmitryax
Copy link
Member

I would like to see journald logs to be parsed right OOTB. I don't know why it shouldn't be done right at the source without any additional operator or processor. Maybe we can provide an option to keep the raw message instead of parsing if user wants that?

@djaglowski
Copy link
Member Author

I support that the operator ought to do further parsing as well. We already do this in syslog_input to an extent. Some users may prefer the collector to do less parsing, but unless someone is asking for it, I think it's reasonable to add "don't parse" controls later as necessary.

Mapping from priority to severity, the body being only the MESSAGE string, and everything else going to attributes makes sense to me. I also agree with applying semantic conventions as they currently apply. In the long term, I assume we'll have semantic conventions for most/all the fields, and I would personally like to see this operator apply them wherever possible.

What's less clear to me is whether we should work ahead of the semantic conventions. Given that the library is not yet stable and that development can easily outpace semantic debate, I think it's probably reasonable to make a best effort at appropriately naming and then propose the attributes to the spec repo.

What thoughts do others have on all this?

@dmitryax
Copy link
Member

dmitryax commented Jan 21, 2022

I fully agree on the first two paragraphs.

What's less clear to me is whether we should work ahead of the semantic conventions. Given that the library is not yet stable and that development can easily outpace semantic debate, I think it's probably reasonable to make a best effort at appropriately naming and then propose the attributes to the spec repo.

I see your point, but from the other hand we can cause additional breaking changes to the attributes that the library emits if our best effort doesn't survive the semantic debate. I would go with the semantic conventions first, or in parallel, to make sure that once the change to the library is released at least attribute names are not going to change.

@gregoryfranklin
Copy link
Contributor

I'll start with a PR that moves all the fields to attributes instead of body and sets the severity.

So, it will change from this

"entry": {
  "timestamp": "2020-04-16T11:05:49.516168-04:00",
  "body": {
    "CODE_FILE": "../src/core/unit.c",
    "CODE_FUNC": "unit_log_success",
    "CODE_LINE": "5487",
    "MESSAGE": "var-lib-docker-overlay2-bff8130ef3f66eeb81ce2102f1ac34cfa7a10fcbd1b8ae27c6c5a1543f64ddb7-merged.mount: Succeeded.",
    "MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448",
    "PRIORITY": "6",

to this

"entry": {
  "timestamp": "2020-04-16T11:05:49.516168-04:00",
  "body": "var-lib-docker-overlay2-bff8130ef3f66eeb81ce2102f1ac34cfa7a10fcbd1b8ae27c6c5a1543f64ddb7-merged.mount: Succeeded.",
  "severity": 9,
  "severity_text", "INFO",
  "attributes": {
    "CODE_FILE": "../src/core/unit.c",
    "CODE_FUNC": "unit_log_success",
    "CODE_LINE": "5487",
    "MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448",

Some of the fields are also clearly resource attributes rather than log attributes, but I think that is linked to the semantic conventions discussion so will leave for a later PR.

I agree it makes sense to wait for the discussion on semantic conventions to move further along before applying these to the library. In the spec there is an example mapping for syslog (and many other common logging formats) - it would be nice to see the same for journald.

@jsirianni
Copy link
Member

Has there been any movement on this? If not, I would be happy to jump in. I have been working on integrating Journald into a few projects and it has been frustrating trying to move all fields over to attributes.

@djaglowski
Copy link
Member Author

I don't see any reason why this shouldn't move forward. Putting this info in attributes is clearly an improvement in terms of respecting the data model.

@gregoryfranklin
Copy link
Contributor

I had a go at this a while back but the decision was to open an issue against the specification to get the missing semantic conventions defined.
You'll see in the PR (open-telemetry/opentelemetry-log-collection#353) that I identified all the well-defined fields from journald and which of them could be mapped to existing semantic conventions.

But, I've not had time to follow up on it in terms of opening an issue against the spec or anything.

@djaglowski
Copy link
Member Author

There is an ongoing discussion about whether and how semantic conventions should be defined on a per component basis, so it is not even fully settled that these attributes will ever be codified. At this point, I don't think we should be waiting on semantic conventions. Many other components are moving forward with best effort attribute naming. This is an alpha components anyways, so names are not irreversible.

What we know for sure at this point is that this integration is not adhering to the data model's direction that first-party libraries should emit logs with the body as a string. We should correct this behavior.

Since adhering to the data model will require us to set attributes, we could either 1) follow the style and spirit of the semantic conventions, making a best effort to establish reasonable attribute names, or 2) use the raw values from journald (eg. _MACHINE_ID). The semantic conventions provide detailed guidance on naming attributes, so I believe we should apply these as best we can to this domain.

@codeboten codeboten added the needs triage New item requiring triage label Sep 16, 2022
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@djaglowski
Copy link
Member Author

This is still relevant, but should wait until a final decision on the ECS OTEP.

@atoulme atoulme added receiver/journald and removed needs triage New item requiring triage labels Jan 15, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Mar 16, 2023
@djaglowski djaglowski added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Mar 16, 2023
animetauren pushed a commit to animetauren/opentelemetry-collector-contrib that referenced this issue Apr 4, 2023
)

Signed-off-by: Alex Boten <aboten@lightstep.com>
@cwegener
Copy link
Contributor

This is still relevant, but should wait until a final decision on the ECS OTEP.

Since ECS OTEP has been merged, does that change anything regarding this journald issue? I can't see anything about the ECS OTEP that would have an impact.

@fionera
Copy link

fionera commented Nov 27, 2023

I am also interested in this. Should I revive the PR that @gregoryfranklin made?

@djaglowski
Copy link
Member Author

Looks like I mistakenly believed ECS defined semantics for journald which we would use. I think we can work out our own semantics but should keep the component maturity level at alpha until we've defined semantic conventions for any attributes which we use.

@cwegener
Copy link
Contributor

Was there a Google Doc already to work on Journald semantics?

@djaglowski
Copy link
Member Author

Was there a Google Doc already to work on Journald semantics?

I'm not aware of one.

@FeltrinN
Copy link
Contributor

Hi, I'm interested in this too :) @djaglowski @cwegener is there any work being done that it's not tracked? Anything that can be done to help out?

@cwegener
Copy link
Contributor

Hi, I'm interested in this too :) @djaglowski @cwegener is there any work being done that it's not tracked? Anything that can be done to help out?

I haven't done any work on this. I think the first step would be to review the fields in Journald and figure out how many of them should/can be mapped into OTel Log attributes. And if there is any impact on existing or not yet existing semantic conventions.

The available Journald fields are: https://www.freedesktop.org/software/systemd/man/latest/systemd.journal-fields.html

@FeltrinN
Copy link
Contributor

Sounds good. I'll try putting together a table unless somebody objects :) Once I've got a reasonable draft I'll open a documentation PR...

@cwegener
Copy link
Contributor

Sounds good. I'll try putting together a table unless somebody objects :) Once I've got a reasonable draft I'll open a documentation PR...

The old PR for this already had the first cut of mapping to semconv done, which might help. https://github.com/open-telemetry/opentelemetry-log-collection/pull/353/files#diff-913aee336b63830976ccc7a02800e0837017f628830bce49f88dc4aeabffdebe

I can't easily tell how much code of that old PR would actually still apply today though (e.g. I can't tell if the way how the severity mapping was done in the old PR would still how it should be done today)

@FeltrinN
Copy link
Contributor

(e.g. I can't tell if the way how the severity mapping was done in the old PR would still how it should be done today)

I've done some very early stages work, but I think it's clear already that the severity mapping from the earlier PR is valid (and quite stable, as those mappings are already in an appendix to the semantic conventions). Therefore, if anyone feels like it, I think it should be okay to split the old PR and merge the severity mappings already, as getting the fields in order will probably take a bit longer...

@yaleman
Copy link

yaleman commented Mar 5, 2024

On this topic, is there an easy-ish way of doing this with transforms/existing tooling today?

@t-davies
Copy link

t-davies commented Apr 5, 2024

On this topic, is there an easy-ish way of doing this with transforms/existing tooling today?

This should get you most of the way there, 100% behind having this done automatically though.

processors:
  transform:
    log_statements:
      - context: log
        statements:
          - set(severity_number, SEVERITY_NUMBER_DEBUG) where Int(body["PRIORITY"]) == 7
          - set(severity_number, SEVERITY_NUMBER_INFO) where Int(body["PRIORITY"]) == 6
          - set(severity_number, SEVERITY_NUMBER_INFO2) where Int(body["PRIORITY"]) == 5
          - set(severity_number, SEVERITY_NUMBER_WARN) where Int(body["PRIORITY"]) == 4
          - set(severity_number, SEVERITY_NUMBER_ERROR) where Int(body["PRIORITY"]) == 3
          - set(severity_number, SEVERITY_NUMBER_FATAL) where Int(body["PRIORITY"]) <= 2
          - set(resource.attributes["process.pid"], body["_PID"])
          - set(resource.attributes["process.command_line"], body["_CMDLINE"])
          - set(resource.attributes["process.command"], body["_COMM"])
          - set(resource.attributes["process.executable.path"], body["_EXE"])
          - set(body, body["MESSAGE"])

@yaleman
Copy link

yaleman commented Apr 13, 2024

Thanks @t-davies that works great.

@steverweber
Copy link

steverweber commented Apr 14, 2024

regarding t-Davies solution.

Should this be set on "attributes" instead of "resource.attributes" ?

          - set(attributes["process.pid"], body["_PID"])

I'm wondering if "attributes" is more so geared for high cardinality and important for some database backends to prevent sharding like splitting. If this is true it would be nice to have this touched on in docs https://opentelemetry.io/docs/concepts/resources/

@snv
Copy link

snv commented Jun 24, 2024

Hi,
I found this thread while looking why so much information gets lost along the way from my logs.

My use case is, that i see in grafana only
image

while the debug output from the collector shows that the input was a lot richer

2024-06-24T07:27:50.198+0200 info ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
-> host.name: Str(vs-devop-tempo-01.optadata-gruppe.local)
-> os.type: Str(linux)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2024-06-24 05:27:50.008482524 +0000 UTC
Timestamp: 2024-06-24 05:27:49.533141 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Map({"MESSAGE":"level=info ts=2024-06-24T05:27:49.533082034Z caller=registry.go:236 tenant=single-tenant msg="collecting metrics" active_series=2091","PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"tempo","_BOOT_ID":"18d2abf8395d4333b025bea956c1d5d7","_CAP_EFFECTIVE":"0","_CMDLINE":"/usr/bin/tempo -config.file /etc/tempo/config.yml","_COMM":"tempo","_EXE":"/usr/bin/tempo","_GID":"65534","_HOSTNAME":"vs-devop-tempo-01","_MACHINE_ID":"a48d0459d22ae4ec8553c99165c0af6f","_PID":"3249697","_RUNTIME_SCOPE":"system","_SELINUX_CONTEXT":"unconfined\n","_STREAM_ID":"7dfe21fbf3ee44e795a16b017a0d6648","_SYSTEMD_CGROUP":"/system.slice/tempo.service","_SYSTEMD_INVOCATION_ID":"9a5cfe40ac87458c9121038a00d2b049","_SYSTEMD_SLICE":"system.slice","_SYSTEMD_UNIT":"tempo.service","_TRANSPORT":"stdout","_UID":"111","__CURSOR":"s=11409092ba2941e59ab2a6295421fa01;i=77cb61;b=18d2abf8395d4333b025bea956c1d5d7;m=afca6051f9b;t=61b9c0ac101d5;x=5f64b366e0f30616","__MONOTONIC_TIMESTAMP":"12080233389979"})
Trace ID:
Span ID:
Flags: 0
{"kind": "exporter", "data_type": "logs", "name": "debug"}

I really would like to at least be able to filter my logs in grafana by the values from SYSLOG_IDENTIFIER or _SYSTEMD_UNIT, but those are not converted to attributes, yet.

Can i -- as a workaround -- archive that using the operators configuration?

@cwegener
Copy link
Contributor

Can i -- as a workaround -- archive that using the operators configuration?

I would recommend using a transform processor instead. e.g.

  transform/journald:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - set(cache, body)
          - set(attributes["systemd.unit"], cache["_SYSTEMD_UNIT"]) where cache["_SYSTEMD_UNIT"] != nil

transform processors can run on the Gateway collector, whereas operators would typically run in the journald receiver on the Agent collector instead. Shifting the heavier processing from the Agent to the Gateway is usually preferred.

@steverweber
Copy link

perhaps it'll help someone, here's what im using. The attributes["job"] might be handy.

receivers:
  journald:
    #priority: debug #(default: info)
    resource: # tag or label the receiver
      receiver: otel-journald
processors:
  transform/journald:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          # https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/ottl/ottlfuncs/README.md
          #--- severity level priority
          - set(severity_number, SEVERITY_NUMBER_DEBUG) where Int(body["PRIORITY"]) == 7
          - set(severity_number, SEVERITY_NUMBER_INFO) where Int(body["PRIORITY"]) == 6
          - set(severity_number, SEVERITY_NUMBER_INFO2) where Int(body["PRIORITY"]) == 5
          - set(severity_number, SEVERITY_NUMBER_WARN) where Int(body["PRIORITY"]) == 4
          - set(severity_number, SEVERITY_NUMBER_ERROR) where Int(body["PRIORITY"]) == 3
          - set(severity_number, SEVERITY_NUMBER_FATAL) where Int(body["PRIORITY"]) <= 2
          - set(attributes["priority"], body["PRIORITY"])
          #unneeded - set(attributes["level"], severity_number)
          #--- trace who and what generated log message
          - set(attributes["process.comm"], body["_COMM"])
          - set(attributes["process.exec"], body["_EXE"])
          - set(attributes["process.uid"], body["_UID"])
          - set(attributes["process.gid"], body["_GID"])
          - set(attributes["owner_uid"], body["_SYSTEMD_OWNER_UID"])
          - set(attributes["unit"], body["_SYSTEMD_UNIT"])
          - set(attributes["syslog_identifier"], body["SYSLOG_IDENTIFIER"])
          #--- verbose extras
          #- set(attributes["process.pid"], body["_PID"])
          #- set(attributes["process.cmdline"], body["_CMDLINE"])
          #- set(attributes["_SYSTEMD_SLICE"], body["_SYSTEMD_SLICE"])
          #- set(attributes["runtime_scope"], body["_RUNTIME_SCOPE"])
          #--- create low cardinality "job" identifier
          # user@xxx.service mnt-xxx.mount run-xxx.scope session-xxx.scope
          ## ^([a-zA-Z_]{3,20})   ([^\\-\\.\\@0-9]+).*
          - set(attributes["syslog_identifier_prefix"], ConvertCase(body["SYSLOG_IDENTIFIER"], "lower")) where body["SYSLOG_IDENTIFIER"] != nil
          - replace_pattern(attributes["syslog_identifier_prefix"], "^[^a-zA-Z]*([a-zA-Z]{3,25}).*", "$$1") where body["SYSLOG_IDENTIFIER"] != nil
          - set(attributes["unit_prefix"], ConvertCase(body["_SYSTEMD_UNIT"], "lower")) where body["_SYSTEMD_UNIT"] != nil
          - replace_pattern(attributes["unit_prefix"], "^[^a-zA-Z]*([a-zA-Z]{3,25}).*", "$$1") where body["_SYSTEMD_UNIT"] != nil
          - set(attributes["job"], attributes["syslog_identifier_prefix"])
          - set(attributes["job"], attributes["unit_prefix"]) where attributes["job"] == nil and attributes["unit_prefix"] != nil
          #- delete_key(attributes, "syslog_identifier_prefix")
          #- delete_key(attributes, "unit_prefix")
          #--- remove jorunald metadata and make log body simple
          - set(body, body["MESSAGE"])
  groupbyattrs:
    keys:
      - service.name
      - host.name
      - receiver
      - job
service:
  pipelines:
    logs/journald:
      receivers: [journald]
      processors: [transform/journald, batch, resource/default, resourcedetection/system, groupbyattrs]
      exporters: [loki/VictoriaLogs, otlphttp/loki]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
never stale Issues marked with this label will be never staled and automatically removed receiver/journald
Projects
None yet