Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(EKS Fargate): Add multiline support to EKS Fargate #3059

Merged
merged 2 commits into from
May 22, 2023

Conversation

rnishtala-sumo
Copy link
Contributor

@rnishtala-sumo rnishtala-sumo commented May 18, 2023

Add multiline support to EKS Fargate

https://stagdata.long.sumologic.net/ui/#/search/create?id=HS2q7MasCkcxQkWvkMdVFUEfYJ3Ku7DYbByPc4XL

  • Changelog updated or skip changelog label added
  • Documentation updated
  • Template tests added for new features

@github-actions github-actions bot added the documentation documentation label May 18, 2023
@rnishtala-sumo rnishtala-sumo force-pushed the fargate-multiline-logs branch from abe0a0c to 21726d3 Compare May 18, 2023 20:50
@rnishtala-sumo rnishtala-sumo marked this pull request as ready for review May 18, 2023 21:05
@rnishtala-sumo rnishtala-sumo requested a review from a team as a code owner May 18, 2023 21:05
groupbyattrs/stream:
keys:
- cloudwatch.log.stream
## need to reset the source identifier after grouping
Copy link
Contributor

@sumo-drosiek sumo-drosiek May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to reset it? We can use resource attribute in logstransform's recombine operator. Now I see we set it to use in logstransform and then we remove it, which seems like unnecessary overhead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Initially we didn't think about that, but that makes a lot of sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to set the cloudwatch.log.stream as a record attribute so that we could use it to group all the log records under a resource. If we don't set this as a record attribute, then what would I group by?

this is from the doc for groupby attr

If the log record and metric data point has at least one of the specified attributes key, it will be moved to a Resource with the same value for these attributes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess he's referring to setting this after grouping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that makes sense, made the change.

Copy link
Contributor Author

@rnishtala-sumo rnishtala-sumo May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumo-drosiek @aboguszewski-sumo actually I get the following warning when I try to use the log stream resource attribute as the source identifier for the recombine operator

2023-05-19T17:00:15.352Z    warn    recombine/recombine.go:235    entry does not contain the source_identifier, so it may be pooled with other sources    {"kind": "processor", "name": "logstransform/cloudwatch", "pipeline │
│ ": "logs/collector/otelcloudwatch", "operator_id": "merge-cri-lines", "operator_type": "recombine"}

I'll revert to copying the log stream attribute to record level to prevent unexpected issues.

Copy link
Contributor Author

@rnishtala-sumo rnishtala-sumo May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed that using a log stream record attr as the source id clears the above warning. In the interest of avoiding a scenario where we combine logs from multiple sources (as the above warning indicates), I'd like to keep the original changes

Copy link
Contributor

@sumo-drosiek sumo-drosiek May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is working for me:

configuration:

receivers:
  filelog:
    include:
      - ./tmp/logs/multiline.json  
    start_at: beginning
exporters:
  logging:
    verbosity: detailed
processors:
  logstransform/containers_parse_json:
    operators:
      - if: body matches "^{[\\s\\S]+"
        parse_from: body
        parse_to: body
        type: json_parser
      - type: add
        field: resource["cloudwatch.log.stream"]
        value: resource_attribute
  logstransform:
    operators:
      - id: merge-multiline-logs
        combine_field: body.log
        combine_with: "\n"
        is_first_entry: body.log matches "^a"
        source_identifier: resource["cloudwatch.log.stream"]
        # source_identifier: attributes["log.file.name"]
        type: recombine
service:
  pipelines:
    logs:
      receivers:
        - filelog
      processors:
        - logstransform/containers_parse_json
        - logstransform
      exporters:
        - logging

file (./tmp/logs/multiline.json ):

{"log": "abc"}
{"log": "def"}
{"log": "ghi"}
{"log": "asdc"}

Copy link
Contributor Author

@rnishtala-sumo rnishtala-sumo May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the source id to resource["cloudwatch.log.stream"] and that seems to work, no warnings. thanks!
https://stagdata.long.sumologic.net/ui/#/search/create?id=RDmXCQ1yhB5eBla9BPzgBCvMFYprpbG7EY0DAT7M

Copy link
Contributor

@sumo-drosiek sumo-drosiek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, no need to move cloudwatch.log.stream to record level atributes. We whoulc be able to use resource level attribute

@rnishtala-sumo rnishtala-sumo force-pushed the fargate-multiline-logs branch from 3a17cb8 to 0fe793a Compare May 19, 2023 13:36
Comment on lines +92 to +102
- transform/set_source_identifier
- groupbyattrs/stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks hacky 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we're grouping all the logs by the cloudwatch log stream. we'll also plan to raise an upstream issue for the cloudwatch receiver so that the logs are placed (correctly) under a log stream resource.

sumo-drosiek
sumo-drosiek previously approved these changes May 19, 2023
Copy link
Contributor

@sumo-drosiek sumo-drosiek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rnishtala-sumo rnishtala-sumo force-pushed the fargate-multiline-logs branch from 0fe793a to 21726d3 Compare May 19, 2023 17:09
@sumo-drosiek sumo-drosiek dismissed their stale review May 22, 2023 06:12

Need to investigate why resoure attributes are not working with recombine operator

@rnishtala-sumo rnishtala-sumo merged commit 69cf342 into main May 22, 2023
@rnishtala-sumo rnishtala-sumo deleted the fargate-multiline-logs branch May 22, 2023 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants