Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/otlpjson] Do not emit empty batches #35738

Closed
andrzej-stencel opened this issue Oct 11, 2024 · 8 comments · Fixed by #35827
Closed

[connector/otlpjson] Do not emit empty batches #35738

andrzej-stencel opened this issue Oct 11, 2024 · 8 comments · Fixed by #35827
Assignees
Labels
connector/otlpjson enhancement New feature or request

Comments

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Oct 11, 2024

Component(s)

connector/otlpjson

Is your feature request related to a problem? Please describe.

Version: v0.111.0

Steps to reproduce:

  1. Prepare a JSON file (or multiple different files) with OTLP JSON logs, metrics and traces. (For example start a collector with OTLP receiver and File exporter and run telemetrygen to send logs, metrics, spans to it.) See otlp-all.json file below for a working example.
  2. Start the OTel Collector v0.111.0 with the below configuration that uses Filelog receiver to read the file(s) and passes the log lines to the OTLP/JSON connector and then to the Debug exporter for all signal types (logs, metrics, traces).

config.yaml:

connectors:
  otlpjson:
exporters:
  debug:
    verbosity: basic
receivers:
  filelog:
    include:
      - otlp-all.json
    start_at: beginning
service:
  pipelines:
    logs/input:
      exporters: [otlpjson]
      receivers: [filelog]
    logs:
      exporters: [debug]
      receivers: [otlpjson]
    metrics:
      exporters: [debug]
      receivers: [otlpjson]
    traces:
      exporters: [debug]
      receivers: [otlpjson]

otlp-all.json:

{"resourceLogs":[{"resource":{},"scopeLogs":[{"scope":{},"logRecords":[{"timeUnixNano":"1728633131034642343","severityNumber":9,"severityText":"Info","body":{"stringValue":"the message"},"attributes":[{"key":"app","value":{"stringValue":"server"}}],"droppedAttributesCount":1,"traceId":"","spanId":""}]}]}]}
{"resourceMetrics":[{"resource":{},"scopeMetrics":[{"scope":{},"metrics":[{"name":"gen","gauge":{"dataPoints":[{"timeUnixNano":"1728633123818516058","asInt":"0"}]}}]}],"schemaUrl":"https://opentelemetry.io/schemas/1.13.0"}]}
{"resourceSpans":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"telemetrygen"}}]},"scopeSpans":[{"scope":{"name":"telemetrygen"},"spans":[{"traceId":"554fe3d717447686b5427188a768eb16","spanId":"ef398864cd536fa8","parentSpanId":"d252c1d6ce0f4452","flags":256,"name":"okey-dokey-0","kind":2,"startTimeUnixNano":"1728633116345405444","endTimeUnixNano":"1728633116345528444","attributes":[{"key":"net.peer.ip","value":{"stringValue":"1.2.3.4"}},{"key":"peer.service","value":{"stringValue":"telemetrygen-client"}}],"status":{}},{"traceId":"554fe3d717447686b5427188a768eb16","spanId":"d252c1d6ce0f4452","parentSpanId":"","flags":256,"name":"lets-go","kind":3,"startTimeUnixNano":"1728633116345405444","endTimeUnixNano":"1728633116345528444","attributes":[{"key":"net.peer.ip","value":{"stringValue":"1.2.3.4"}},{"key":"peer.service","value":{"stringValue":"telemetrygen-server"}}],"status":{}}]}],"schemaUrl":"https://opentelemetry.io/schemas/1.4.0"}]}

Actual behavior:

The OTLP/JSON connector emits empty logs, metrics, traces resources. Here is the output from the Debug exporter:

2024-10-11T12:25:01.074+0200    info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-11T12:25:01.074+0200    info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 0, "log records": 0}
2024-10-11T12:25:01.075+0200    info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 0, "log records": 0}
2024-10-11T12:25:01.075+0200    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}
2024-10-11T12:25:01.075+0200    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1}
2024-10-11T12:25:01.075+0200    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}
2024-10-11T12:25:01.075+0200    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 0, "spans": 0}
2024-10-11T12:25:01.075+0200    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 0, "spans": 0}
2024-10-11T12:25:01.075+0200    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}

Describe the solution you'd like

Expected behavior:

I would expect the OTLP/JSON connector to not emit empty resources. Here's the output from Debug exporter I would expect:

2024-10-11T12:25:01.074+0200    info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-11T12:25:01.075+0200    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1}
2024-10-11T12:25:01.075+0200    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}

Describe alternatives you've considered

No response

Additional context

The Unmarshal(Logs|Metrics|Traces) methods do not return an error for valid JSON that is not a valid OTLP payload. They just return an empty object of logs/metrics/traces data. See https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.111.0/connector/otlpjsonconnector/traces.go#L55 for example. The code should be updated to not only check for error, but also check if the payload has any data in it.

@andrzej-stencel andrzej-stencel added enhancement New feature or request needs triage New item requiring triage labels Oct 11, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@andrzej-stencel andrzej-stencel changed the title [connector/otlpjson] Do not emit empty resources [connector/otlpjson] Do not emit empty batches Oct 11, 2024
@ChrsMark
Copy link
Member

ChrsMark commented Oct 11, 2024

I think it makes sense to skip empty records. Let's take into account also #35739, since it looks quite related.

@khushijain21
Copy link
Contributor

Hello, is this open for contribution, can you assign this to me?

@djaglowski
Copy link
Member

djaglowski commented Oct 16, 2024

I think this issue requires further explanation because there is more going on here than just emitting empty objects, and I think we may want additional changes.

The example given here highlights the way in which connectors are instanced. For a fuller explanation, see open-telemetry/opentelemetry-collector#10534, but in short, they are instanced per ordered pair of data types.

The example depends on using a single configuration for all three signals, so we end up with three instances of the connector (logs->logs, logs->metrics, logs->traces). When data is passed to the connector, we are essentially calling three different functions simultaneously, one for each instance. This means that each instance will try to marshal every line into the corresponding output data type. In this case, each instance will successfully unmarshal one of the three lines, and fail to unmarshal the other two and log an error for each of the failures.

I agree that we should not send empty data when unmarshaling fails, but the overall behavior here is not great. Should we really log an error when the logs->metrics instance fails to unmarshal json that is clearly a ptrace.Traces, etc? More fundamentally, do we really want every line to be handled by every instance?

Finally, although I don't think it's terribly important, we should note that it's possible to have a valid otlpjson line that does not contain any resource. Does it really make sense for this component to be responsible for filtering those out? There could be some obscure use cases, but really I'm just pointing this out because it seems like we're hardcoding in the behavior of a processor which already can be opted into. Generally I think it's better for components to do the minimum and then allow users to compose components as necessary.


My proposal:

We should unify the instances into a singleton (via sharedcomponent same idea as otlp receiver and several other components). Then, we can process each line once, and then parse it once as the appropriate data type. If it fails to parse as the expected data type, then we should log an error. The trick here is efficiently determining which data type the json appears to be before trying to fully parse it. I think there are a few options here but just about any would be more efficient than what we're currently doing. The otlpjson receiver previously used a regex to look for "resourceLogs", etc.

There might be a clever way to define a struct such that any valid otlpjson line is very shallowly unmarshaled, to the point where we can assess which deep unmarshaler to use. I'm not an expert on this but maybe something like the following is possible:

type AnyType struct {
    ResourceLogs []byte `json:"resourceLogs"` // or some other type that doesn't fully unmarshal the content.
    ResourceMetrics []byte `json:"resourceMetrics"`
    ResourceSpans []byte `json:"resourceSpans"`
}

@ChrsMark
Copy link
Member

Thank's for these details @djaglowski! I wasn't really familiar with the sharedcomponent idea but what you propose sounds like a solid improvement. A couple of questions:

  1. I guess this would also cover [connector/otlpjson] Invalid OTLP payload is silently ignored #35739, since if none of [resourceLogs, resourceMetrics, resourceSpans] is detected the entry will be just skipped? However you also mention that it's possible to have a valid otlpjson line that does not contain any resource. I'm not sure what we mean by resource here 🤔.

  2. Mostly for my understanding: does otlpjsonfilereceiver hit a similar issue where a metrics' receiver instance would try to handle and eventually skip a ptrace? (I could not find this

@djaglowski
Copy link
Member

  1. I guess this would also cover [connector/otlpjson] Invalid OTLP payload is silently ignored #35739, since if none of [resourceLogs, resourceMetrics, resourceSpans] is detected the entry will be just skipped? However you also mention that it's possible to have a valid otlpjson line that does not contain any resource. I'm not sure what we mean by resource here 🤔.

#35739 basically says that invalid JSON is already handled, but invalid otlpjson is ignored. What I'm saying is that there is such a thing as valid otlpjson which is essentially empty. e.g. {"resourceLogs":[]} The implementation of #35827 would exclude this because is decided based on if l.ResourceLogs().Len() != 0. My point is, if l.ResourceLogs().Len() == 0 is not the same as invalid otlpjson.

  1. Mostly for my understanding: does otlpjsonfilereceiver hit a similar issue where a metrics' receiver instance would try to handle and eventually skip a ptrace? (I could not find this

I think it was probably also not handling this well. I recall similar discussion occurring when that was first implemented and no one thought of a good solution so we just accepted it may be noisy. Really only because of seeing these issues again that it has prompted me to think through again how it could be handled better.

@khushijain21
Copy link
Contributor

khushijain21 commented Oct 25, 2024

On creating a sharedComponent similar to what we have in otelReceiver - we see that every line still would still be processed by each connector. Say we have logs and traces connector - and we choose to parse a line based on a regex match - the line would match to its appropriate data type twice (once in logs and once in traces connector)

image

You can check factory.go file I implemented here - https://github.com/khushijain21/opentelemetry-collector-contrib/blob/otlpjsonconn/connector/otlpjsonconnector/factory.go

A simple solution I can think of is (say when you have logs->logs connector)

  • parse a line if it starts with ResourceLogs
  • Ignore if it starts with ResourceMetrics and ResourceSpan
  • Throw an error for everything else
    Let me know what you think

@djaglowski
Copy link
Member

Hmm, I think what is happening is that although the component itself is a singleton, the collector graph still views it as three separate instances, so thinks it needs to send one to each. Apologies for overlooking this. (I think it's a good example of why components should be able to inform the collector graph of whether or not they are a singleton via Capabilities().Singleton or something to that effect.)

A simple solution I can think of is (say when you have logs->logs connector)

  • parse a line if it starts with ResourceLogs
  • Ignore if it starts with ResourceMetrics and ResourceSpan
  • Throw an error for everything else
  • Let me know what you think

I think this might be the best we can do in this component. Most of the time a given file is going to contain telemetry of a uniform type, but when it doesn't, we could recommend to users that they use a routing connector first. I've actually just finished adding the ability to route on a per-log basis, so perhaps this is worth a try.

andrzej-stencel pushed a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this issue Nov 8, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
The connector now does not emit empty batches for invalid otlp payload
and throws an error instead. Approach discussed here
open-telemetry#35738 (comment)

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35738 and open-telemetry#35739

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Manual Testing 

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
pull bot pushed a commit to abaguas/opentelemetry-collector-contrib that referenced this issue Nov 8, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
The connector now does not emit empty batches for invalid otlp payload
and throws an error instead. Approach discussed here
open-telemetry#35738 (comment)

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35738 and open-telemetry#35739

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Manual Testing 

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
The connector now does not emit empty batches for invalid otlp payload
and throws an error instead. Approach discussed here
open-telemetry#35738 (comment)

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35738 and open-telemetry#35739

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Manual Testing 

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connector/otlpjson enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants