Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing statements are not parsed properly if statement strings match #30663

Closed
kristofgyuracz opened this issue Jan 18, 2024 · 16 comments
Closed
Labels
bug Something isn't working connector/routing Stale

Comments

@kristofgyuracz
Copy link
Contributor

kristofgyuracz commented Jan 18, 2024

Component(s)

connector/routing

What happened?

Description

If two routing table entries have the exact same routing strings(, but different target pipelines), messages will be only forwarded to one of the target pipelines.
The reason is that it the route map's key is the routing statement string, and during parsing it is overwritten by the second entry.
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.92.0/connector/routingconnector/router.go#L133

Steps to Reproduce

    connectors:
        routing/subscriptions:
            default_pipelines: []
            table:
                - statement: route()
                  pipelines: [logs/another-subscription-sample]
                - statement: route()
                  pipelines: [logs/subscription-sample]

Expected Result

Messages are sent to both pipelines.

Actual Result

Messages are sent to only one of the pipelines.

Collector version

0.92.0

Environment information

Environment

Kubernetes distro: KinD v1.27.3
Opentelemetry-operator

OpenTelemetry Collector configuration

receivers:
        filelog/kubernetes:
            exclude:
                - /var/log/pods/*/otc-container/*.log
            include:
                - /var/log/pods/*/*/*.log
            include_file_name: false
            include_file_path: true
            operators:
                - id: get-format
                  routes:
                    - expr: body matches "^\\{"
                      output: parser-docker
                    - expr: body matches "^[^ Z]+ "
                      output: parser-crio
                    - expr: body matches "^[^ Z]+Z"
                      output: parser-containerd
                  type: router
                - id: parser-crio
                  output: extract_metadata_from_filepath
                  regex: '''^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'''
                  timestamp:
                    layout: 2006-01-02T15:04:05.999999999Z07:00
                    layout_type: gotime
                    parse_from: attributes.time
                  type: regex_parser
                - id: parser-containerd
                  output: extract_metadata_from_filepath
                  regex: '''^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'''
                  timestamp:
                    layout: '%Y-%m-%dT%H:%M:%S.%LZ'
                    parse_from: attributes.time
                  type: regex_parser
                - id: parser-docker
                  output: extract_metadata_from_filepath
                  timestamp:
                    layout: '%Y-%m-%dT%H:%M:%S.%LZ'
                    parse_from: attributes.time
                  type: json_parser
                - cache:
                    size: 128
                  id: extract_metadata_from_filepath
                  parse_from: attributes["log.file.path"]
                  regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9-]+)\/(?P<container_name>[^\/]+)\/(?P<restart_count>\d+)\.log$
                  type: regex_parser
                - from: attributes.log
                  to: body
                  type: move
                - from: attributes.stream
                  to: attributes["log.iostream"]
                  type: move
                - from: attributes.container_name
                  to: resource["k8s.container.name"]
                  type: move
                - from: attributes.namespace
                  to: resource["k8s.namespace.name"]
                  type: move
                - from: attributes.pod_name
                  to: resource["k8s.pod.name"]
                  type: move
                - from: attributes.restart_count
                  to: resource["k8s.container.restart_count"]
                  type: move
                - from: attributes.uid
                  to: resource["k8s.pod.uid"]
                  type: move
            start_at: end
    exporters:
        otlp/collector_otlp-test-output:
            endpoint: receiver-collector.example-tenant-ns.svc.cluster.local:4317
            tls:
                insecure: true
        otlp/collector_otlp-test-output-2:
            endpoint: receiver-collector.example-tenant-ns.svc.cluster.local:4317
            tls:
                insecure: true
    processors:
        attributes/subscription_another-subscription-sample:
            actions:
                - action: insert
                  key: subscription_name
                  value: another-subscription-sample
        attributes/subscription_subscription-sample:
            actions:
                - action: insert
                  key: subscription_name
                  value: subscription-sample
        attributes/tenant_example-tenant:
            actions:
                - action: insert
                  key: tenant_name
                  value: example-tenant
        k8sattributes:
            auth_type: serviceAccount
            extract:
                metadata:
                    - k8s.pod.name
                    - k8s.pod.uid
                    - k8s.deployment.name
                    - k8s.namespace.name
                    - k8s.node.name
                    - k8s.pod.start_time
            passthrough: false
            pod_association:
                - sources:
                    - name: k8s.namespace.name
                      from: resource_attribute
                    - name: k8s.pod.name
                      from: resource_attribute
    connectors:
        routing/tenant_example-tenant_subscriptions:
            default_pipelines: []
            table:
                - statement: route()
                  pipelines: [logs/tenant_example-tenant_subscription_another-subscription-sample]
                - statement: route()
                  pipelines: [logs/tenant_example-tenant_subscription_subscription-sample]
        routing/tenants:
            default_pipelines: []
            table:
                - statement: route() where IsMatch(attributes["k8s.namespace.name"], "example-tenant-ns")
                  pipelines: [logs/tenant_example-tenant]
    service:
        pipelines:
            logs/all:
                receivers: [filelog/kubernetes]
                processors: [k8sattributes]
                exporters: [routing/tenants]
            logs/tenant_example-tenant:
                receivers: [routing/tenants]
                processors: [attributes/tenant_example-tenant]
                exporters: [routing/tenant_example-tenant_subscriptions]
            logs/tenant_example-tenant_subscription_another-subscription-sample:
                receivers: [routing/tenant_example-tenant_subscriptions]
                processors: [attributes/subscription_another-subscription-sample]
                exporters: [otlp/collector_otlp-test-output-2]
            logs/tenant_example-tenant_subscription_subscription-sample:
                receivers: [routing/tenant_example-tenant_subscriptions]
                processors: [attributes/subscription_subscription-sample]
                exporters: [otlp/collector_otlp-test-output]

Log output

No response

Additional context

No response

@kristofgyuracz kristofgyuracz added bug Something isn't working needs triage New item requiring triage labels Jan 18, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@kristofgyuracz
Copy link
Contributor Author

kristofgyuracz commented Jan 18, 2024

Added a draft pull request that would fix this issue (according to unit tests). In the meantime I'll build a collector image from this state and test it manually.

@crobert-1
Copy link
Member

Is there any reason the configuration you're sharing wouldn't be the following?

    connectors:
        routing/subscriptions:
            default_pipelines: []
            table:
                - statement: route()
                  pipelines: [logs/subscription-sample, logs/another-subscription-sample]

The connector very clearly supports routing the same statement to multiple pipelines. In the case of the config you've shared I'd actually prefer failing the collector on startup with a descriptive message telling the user that multiple identical routing statements should be consolidated into one. This may break existing configurations that currently run, but the existing configurations are not doing what users want, so I don't believe this is actually a negative impact.

The alternative would be internally resolving the mismatched logic in the collector as you've shared, but I worry about long term complexity of this going forward. In general, having users supply bad configs and then the collector fixing it without letting the user know can lead to ambiguous behavior and complex code. I'd rather just keep it simple and clearly communicate to users when something like this is happening.

I'll defer to code owners though, I might be missing something here.

@kristofgyuracz
Copy link
Contributor Author

My assumption here was that the intended behaviour is what is in the submitted draft. At the current state the config parsing only "detects" these kinds equivalent statements if they are completely matching strings, but in case one is route() while the other is route() , they are still parsed as two different statements. If we want to work towards failing in that case, it would be reasonable to do a deeper parsing of the statements, and return errors if a match is detected. Also the logic is similar in any other kind of equivalent statements (eg. route() where 1==1, or changing up commutative logical expressions).

@crobert-1
Copy link
Member

crobert-1 commented Jan 19, 2024

That's fair, but in the case of whitespace differences (or any other string mismatches) the existing solution handles it the way you're requesting, right? The statements will still get routed where they're supposed to go and there wouldn't be conflicts.

It just makes more sense to me to add more validation to input rather than adding internal logic to handle config mistakes programmatically. The collector's design makes it so that every defined config can go through it's own validation process, resulting the collector failing to start on invalid configs. It seems like the more idiomatic approach for the collector's design.

I'll defer to code owners here though, I might be missing something. 👍

@pepov
Copy link

pepov commented Jan 24, 2024

My two cents: I beleive the point in @kristofgyuracz 's reasoning is that adding meaningful validation of statements (to make sure they are non-identical) would be non-trivial to implement and a simple equality check doesn't really add that much protection.

@crobert-1
Copy link
Member

adding meaningful validation of statements (to make sure they are non-identical) would be non-trivial to implement

Agreed! I don't see much value in that at this point, given the amount of work required and the relatively little impact.

simple equality check doesn't really add that much protection

Also agreed, however, the only existing issue that's being discussed is a 100% string match between two different routes. This is the bug that we're facing that causes one route to be over-written by another. If there's any trivial whitespace difference they will be separate entries in the map, and their routes will be preserved. Am I still following? Did I miss something?

The more I think about this the less strong of a preference I have, whether we fail config validation or simply handle it the way the PR has proposed. I'll defer to others, I'm fine either way.

@kristofgyuracz
Copy link
Contributor Author

kristofgyuracz commented Jan 26, 2024

Also agreed, however, the only existing issue that's being discussed is a 100% string match between two different routes. This is the bug that we're facing that causes one route to be over-written by another. If there's any trivial whitespace difference they will be separate entries in the map, and their routes will be preserved. Am I still following? Did I miss something?

Yes, that's right. We were able to do a workaround for this issue, and I agree with you that either direction could be fine, but it would be nice to have a clearly deterministic behaviour.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Jan 26, 2024
@jpkrohling
Copy link
Member

I don't know what to take from this issue: the title doesn't seem to match what was discussed so far. Here's what I understand the routing connector does, and what might be done to improve it:

✔ same string, multiple pipelines
✔ different strings, each on their own conditions, sending data to their own pipelines
✔ same telemetry matching multiple conditions, get matched to different routes and can reach multiple pipelines
✘ exact same condition, multiple routes (not a bug, should use the first case)
✘ seemingly equal strings, sending data to their own pipelines (same as the second case)

Other than doing a basic trim on the conditions, I wouldn't do anything else to address the last one. Yes, people can do a condition being "1==1" and "2==2", which are logically the same, but I don't think the connector should care about that at all.

Perhaps @mwear has a different view?

@kristofgyuracz
Copy link
Contributor Author

kristofgyuracz commented Feb 5, 2024

✘ exact same condition, multiple routes (not a bug, should use the first case)

We encountered this specific scenario, and it's a completely valid point that it shouldn't work. The current behaviour is that the config is parsed without any error message, and collector will route messages differently, than what is in the parsed config. I think in this case, logging an error message, or failing to start the collector (with an error message pinpointing the issue) would be more user friendly.

@mwear
Copy link
Member

mwear commented Feb 7, 2024

I would strongly prefer that we handle this through better validation rather than introducing unnecessary complexity into the router itself. We can check for conflicting routes during config validate, and log a warning, or return an error. Routers in the wild vary in how they handle conflicting routes. http.ServeMux will panic, whereas Chi will allow them, so a case can be made for returning an error, doing nothing, or logging a warning. I have a slight preference for logging a warning.

@kristofgyuracz
Copy link
Contributor Author

Thanks for the feedback! If that's okay, i'll submit a PR for logging a warning and continuing.

@mwear
Copy link
Member

mwear commented Feb 9, 2024

Thanks for the feedback! If that's okay, i'll submit a PR for logging a warning and continuing.

That'd be great. Thanks!

jpkrohling pushed a commit that referenced this issue Feb 28, 2024
)

**Description:** <Describe what has changed.>
Based on discussion in #30663, a warning is logged if there are two or
more routing items with the same routing statement.

**Link to tracking Issue:** #30663
XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this issue Mar 13, 2024
…n-telemetry#31297)

**Description:** <Describe what has changed.>
Based on discussion in open-telemetry#30663, a warning is logged if there are two or
more routing items with the same routing statement.

**Link to tracking Issue:** open-telemetry#30663
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 9, 2024
@OverOrion
Copy link
Contributor

I believe this can be closed as #31297 has been merged.

@crobert-1
Copy link
Member

Resolved by #31297

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/routing Stale
Projects
None yet
Development

No branches or pull requests

6 participants