[exporter/elasticsearch] Ability to control final document structure for logs #35444

mauri870 · 2024-09-26T20:17:40Z

Component(s)

exporter/elasticsearch

Is your feature request related to a problem? Please describe.

At Elastic we are working on transitioning Beats to be OTel receivers. During this migration we decided that we want to forward structured beats events in the LogRecord body. This way processors can interact with the body(beats event) as they see fit.

We need to preserve the structure and fields that comes from the body and use that as the final document that is persisted in Elasticsearch, without any decoration or envelope added by the es exporter. In summary, the receiver and processors in the pipeline already aligned the structure in the body of the log record and we want the exporter to act as a passthrough for the body data, converting it to JSON, which will then be ingested directly into Elasticsearch.

Currently, there are different supported mapping modes, but none offer this level of flexibility in the output structure.

Describe the solution you'd like

Essentially, we want the exporter to take the body of each LogRecord as a map and convert it directly into a separate document for Elasticsearch. This assumes that the receivers or processors earlier in the pipeline have already prepared the body with all the fields that will appear in the final document. The Elasticsearch exporter will then function as a passthrough, simply moving each LogRecord body into Elasticsearch as its own document without any modifications.

Describe alternatives you've considered

I'd love to hear ideas on how to support this use case, but we've though of some approches for this.

Support the encoding extension framework in the elasticsearchexporter

This looks promising, I worked on a PoC using the jsonlogencodingextension and it does exactly what we need. It parses the LogRecord body, converts the map into a json and MarshalLogs returns a json-serialized byte slice of the result.

Unfortunately it has some caveats. The jsonlogencodingextension only serializes the first log record body . This means that if all the log records are inside of a single plog.Logs it will only serialize the first LogRecord, which will not work for us. We need each LogRecord to be a separate document at the end. We can cheat a bit in order to marshal a single LogRecord:

func (e *elasticsearchExporter) marshalLog(_ context.Context, record plog.LogRecord) ([]byte, error) {
	export := plog.NewLogs()
	exportrls := export.ResourceLogs().AppendEmpty()
	exportsl := exportrls.ScopeLogs().AppendEmpty()
	exportlogs := exportsl.LogRecords().AppendEmpty()
	record.CopyTo(exportlogs)
	return e.marshaller.MarshalLogs(export)
}

This works, but it is a hack of sorts. If we try to plug any other encoding extension it will work just fine but the output might not be what you expect. For example the otlp_json encoding, the user will likely expect a plog.Logs to become a single entry with an array of LogRecords, and not separate documents. This quirk of the jsonencoder is questioned here. For us it is exactly what we need, but the behavior seems 'strange' for use with other encodings.

Support for a jsonbody mapping mode

This mapping mode would basically take the body of a LogRecord as a map, serialize it into json and that would be the final document to be ingested into Elasticsearch. This solution seems more straightforward and simple, but it does not benefit the otel ecosystem like the push for encoding support does.

Additional context

For context, we have logs similar to this:

logs := plog.NewLogs()
resourceLogs := logs.ResourceLogs().AppendEmpty()
scopeLogs := resourceLogs.ScopeLogs().AppendEmpty()
logRecords := scopeLogs.LogRecords()
logRecord := logRecords.AppendEmpty()

// custom fields
bodyMap := pcommon.NewMap()
bodyMap.PutStr("@timestamp", "1754-08-30T22:43:41.128654848Z")
bodyMap.PutInt("id", 0)
bodyMap.PutStr("key", "value")

bodyMap.CopyTo(logRecord.Body().SetEmptyMap())

With the current es exporter and mapping type none the final document that is send to Elasticsearch looks like this:

{
   "@timestamp":"1754-08-30T22:43:41.128654848Z", // This is not the timestamp from the log record body
   "Body":{
      "@timestamp":"1754-08-30T22:43:41.128654848Z",
      "id":1,
      "key":"value"
   },
   "Scope":{
      "name":"",
      "version":""
   },
   "SeverityNumber":0,
   "TraceFlags":0
}

We would like for it to be:

{
   "@timestamp":"1754-08-30T22:43:41.128654848Z",
   "id": 1,
   "key": "value"
}

The text was updated successfully, but these errors were encountered:

github-actions · 2024-09-26T20:17:55Z

Pinging code owners:

exporter/elasticsearch: @JaredTan95 @carsonip @lahsivjar

See Adding Labels via Comments if you do not have permissions to add labels yourself.

carsonip · 2024-10-02T09:43:38Z

Questions:

Will the body be a string or a map?
ES exporter functionality can be largely divided into 3 areas: mapping and encoding (controls the final document structure), routing (dynamic routing to different indices based on attributes), indexing (batching, sending bulk requests, retries). The proposed beats passthrough will basically bypass mapping and go straight to indexing, but do you need the functionality to dynamic route based on attributes?

mauri870 · 2024-10-02T11:26:50Z

Will the body be a string or a map?

For our use case it will always be a map. I think this is wise in general because we can properly encode the type information we need in the map, that helps to serializing this data properly at the end of the pipeline. Also processors can be added and they can interact with this map more easily.

The proposed beats passthrough will basically bypass mapping and go straight to indexing, but do you need the functionality to dynamic route based on attributes?

Not at this time. It seems like it could be useful, though. I'm not sure we should dismiss the possibility of supporting it if the effort isn't too great.

felixbarny · 2024-10-02T11:59:51Z

I think routing to a specific data stream will be important for the Beats event data passthrough mode. You'll want to be able to define to which data stream the event is sent. IINM, you even want to express metrics as a log record that will then be routed to metric data streams. So I think the attribute-based routing is still very relevant.

carsonip · 2024-10-02T13:56:34Z

/label -needs-triage

carsonip · 2024-10-02T14:05:00Z

For our use case it will always be a map. I think this is wise in general because we can properly encode the type information we need in the map, that helps to serializing this data properly at the end of the pipeline. Also processors can be added and they can interact with this map more easily.

Got it. So the beats passthrough mapping mode will use attributes to route, but the document (payload to ES) will be the exactly the encoded version of body without regard to other fields in the otel LogRecord data structure.

What about dedot and dedup? Will the body map be in a structure that is already dedotted and deduplicated, such that dedot and dedup in es exporter can be bypassed and map is a direct transaction to the resulting document?

mauri870 · 2024-10-02T15:24:09Z

Got it. So the beats passthrough mapping mode will use attributes to route, but the document (payload to ES) will be the exactly the encoded version of body without regard to other fields in the otel LogRecord data structure.

That is correct.

What about dedot and dedup? Will the body map be in a structure that is already dedotted and deduplicated, such that dedot and dedup in es exporter can be bypassed and map is a direct transaction to the resulting document?

I spoke with the team, and we don't require support for dedot, dedup, or any transformation of the body. The final document must match the exact structure of the body of the LogRecord.

carsonip · 2024-10-02T15:41:51Z

Sounds good, I imagine that's not too hard to accomplish.

felixbarny · 2024-10-02T16:29:46Z

I guess the remaining question is on having a new mapping mode specific to the ES exporter or to somehow integrate with the encoder extensions.

Do you have thoughts on that, @carsonip?

carsonip · 2024-10-02T21:58:17Z

I briefly looked at the encoder extensions and the current usages of them in exporters e.g. fileexporter

jsonlogencodingextension would definitely not work out of the box as mentioned, due to the limitation of only processing the first log record.
imo the encoder extensions interface to convert a plog.Logs that may contain multiple log records to a single []byte does not match our abstraction. We will need 1 json per log record, not to mention they may need to be routed dynamically to different indices.
The workaround to split every log record into its own plog.Logs would need to sit in es exporter, and will need to be written well to avoid copying, e.g. keep shifting the log records and scope logs instead of creating copies. This workaround will have to sit inside es exporter code.
In theory adopting encoding extension would enable users to encode them in whatever way they like, with a custom encoding extension.
Now data stream routing is done with data_stream.* fields injection. Meaning that if we use data stream routing, body will be 100% translated to resulting json with the exception of data stream fields. Otherwise data stream routing will need to be handled before es exporter.

mauri870 · 2024-10-04T13:45:24Z

IMHO we don't have enough info on how to support the encoding extension properly, specifically because of the differences between the jsonlogencodingextension and the other extensions that require additional logic to be implemented. We are blocked in making progress with the EDOT until we can figure out how to support the use case described in this issue.

Wdyt of going with a more conservative approach with a new mapping mode, and in the meantime we can look into how to support the encoding extension properly? I have a PoC here that can serve as an initial implementation.

carsonip · 2024-10-04T14:45:37Z

sgtm. A new mapping mode will be fairly straightforward to implement. Marking it as experimental will be fine.

mauri870 · 2024-10-07T11:35:19Z

Thanks. I have submitted a pull request with the implementation.

#### Description This PR implements a new mapping mode `bodymap` that works by serializing each LogRecord body as-is into a separate document for ingestion. Fixes #35444 #### Testing #### Documentation --------- Co-authored-by: Carson Ip <carsonip@users.noreply.github.com>

mauri870 added enhancement New feature or request needs triage New item requiring triage labels Sep 26, 2024

github-actions bot added the exporter/elasticsearch label Sep 26, 2024

mauri870 changed the title ~~Ability to control final document structure for logs~~ [exporter/elasticsearch] Ability to control final document structure for logs Sep 26, 2024

github-actions bot mentioned this issue Oct 1, 2024

Weekly Report: 2024-09-24 - 2024-10-01 #35498

Closed

github-actions bot removed the needs triage New item requiring triage label Oct 2, 2024

mauri870 mentioned this issue Oct 7, 2024

[exporter/elasticsearch] Add mapping mode bodymap #35637

Merged

andrzej-stencel closed this as completed in #35637 Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporter/elasticsearch] Ability to control final document structure for logs #35444

[exporter/elasticsearch] Ability to control final document structure for logs #35444

mauri870 commented Sep 26, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024

carsonip commented Oct 2, 2024

mauri870 commented Oct 2, 2024 •

edited

Loading

felixbarny commented Oct 2, 2024

carsonip commented Oct 2, 2024

carsonip commented Oct 2, 2024

mauri870 commented Oct 2, 2024

carsonip commented Oct 2, 2024

felixbarny commented Oct 2, 2024

carsonip commented Oct 2, 2024

mauri870 commented Oct 4, 2024 •

edited

Loading

carsonip commented Oct 4, 2024

mauri870 commented Oct 7, 2024

[exporter/elasticsearch] Ability to control final document structure for logs #35444

[exporter/elasticsearch] Ability to control final document structure for logs #35444

Comments

mauri870 commented Sep 26, 2024 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Sep 26, 2024

carsonip commented Oct 2, 2024

mauri870 commented Oct 2, 2024 • edited Loading

felixbarny commented Oct 2, 2024

carsonip commented Oct 2, 2024

carsonip commented Oct 2, 2024

mauri870 commented Oct 2, 2024

carsonip commented Oct 2, 2024

felixbarny commented Oct 2, 2024

carsonip commented Oct 2, 2024

mauri870 commented Oct 4, 2024 • edited Loading

carsonip commented Oct 4, 2024

mauri870 commented Oct 7, 2024

mauri870 commented Sep 26, 2024 •

edited

Loading

mauri870 commented Oct 2, 2024 •

edited

Loading

mauri870 commented Oct 4, 2024 •

edited

Loading