Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat][httpjson] filebeat_input.httpjson_interval_pages_total field causing parsing exception #35933

Closed
ebeahan opened this issue Jun 27, 2023 · 10 comments · Fixed by elastic/integrations#7179
Assignees
Labels

Comments

@ebeahan
Copy link
Member

ebeahan commented Jun 27, 2023

Version: 8.9.0-SNAPSHOT

httpjson input metric events are throwing a parsing exception:

{"type":"document_parsing_exception","reason":"[1:2454] failed to parse field [filebeat_input.httpjson_interval_pages_total] of type [long] in document with id '55pe_ogB6FfuXHBPI22x'. Preview of field's value: '{histogram={p99=0, min=0, median=0, max=0, mean=0, count=0, p999=0, stddev=0, p95=0, p75=0}}'","caused_by":{"type":"illegal_argument_exception","reason":"Cannot parse object as number"}}, dropping event!
Full error

Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2023, time.June, 27, 19, 39, 21, 390546293, time.Local), Meta:{"input_id":"metrics-monitoring-agent","raw_index":"metrics-elastic_agent.filebeat_input-ep","stream_id":"metrics-monitoring-filebeat-1"}, Fields:{"agent":{"ephemeral_id":"003acab9-0c32-4e39-bca0-34cd1596ac80","id":"071b3cb0-beb0-4fff-842a-a604ebc6842d","name":"docker-fleet-agent","type":"metricbeat","version":"8.9.0"},"data_stream":{"dataset":"elastic_agent.filebeat_input","namespace":"ep","type":"metrics"},"ecs":{"version":"8.0.0"},"elastic_agent":{"id":"071b3cb0-beb0-4fff-842a-a604ebc6842d","process":"filebeat","snapshot":true,"version":"8.9.0"},"event":{"dataset":"elastic_agent.filebeat_input","duration":9495209,"module":"http"},"filebeat_input":{"http_request_body_bytes":{"histogram":{"count":2,"max":41,"mean":20.5,"median":20.5,"min":0,"p75":41,"p95":41,"p99":41,"p999":41,"stddev":20.5}},"http_request_body_bytes_total":41,"http_request_delete_total":0,"http_request_errors_total":0,"http_request_get_total":1,"http_request_head_total":0,"http_request_options_total":0,"http_request_patch_total":0,"http_request_post_total":1,"http_request_put_total":0,"http_request_total":2,"http_response_1xx_total":0,"http_response_2xx_total":2,"http_response_3xx_total":0,"http_response_4xx_total":0,"http_response_5xx_total":0,"http_response_body_bytes":{"histogram":{"count":2,"max":1141,"mean":698.5,"median":698.5,"min":256,"p75":1141,"p95":1141,"p99":1141,"p999":1141,"stddev":442.5}},"http_response_body_bytes_total":1397,"http_response_errors_total":0,"http_response_total":2,"http_round_trip_time":{"histogram":{"count":2,"max":32373667,"mean":17947291.5,"median":17947291.5,"min":3520916,"p75":32373667,"p95":32373667,"p99":32373667,"p999":32373667,"stddev":14426375.5}},"httpjson_interval_errors_total":0,"httpjson_interval_execution_time":{"histogram":{"count":1,"max":1970751751,"mean":1970751751,"median":1970751751,"min":1970751751,"p75":1970751751,"p95":1970751751,"p99":1970751751,"p999":1970751751,"stddev":0}},"httpjson_interval_pages_execution_time":{"histogram":{"count":1,"max":83083,"mean":83083,"median":83083,"min":83083,"p75":83083,"p95":83083,"p99":83083,"p999":83083,"stddev":0}},"httpjson_interval_pages_total":{"histogram":{"count":0,"max":0,"mean":0,"median":0,"min":0,"p75":0,"p95":0,"p99":0,"p999":0,"stddev":0}},"httpjson_interval_total":1,"id":"httpjson-sophos_central.event-2b542396-0dd9-4234-9ba9-e61ee585ed74::http://elastic-package-service-sophos_central-1:8080/siem/v1/events","input":"httpjson"},"host":{"architecture":"aarch64","containerized":false,"hostname":"docker-fleet-agent","id":"6899bf16759142d49b8b9dd550788e4c","ip":["172.19.0.7"],"mac":["02-42-AC-13-00-07"],"name":"docker-fleet-agent","os":{"codename":"focal","family":"debian","kernel":"5.15.49-linuxkit-pr","name":"Ubuntu","platform":"ubuntu","type":"linux","version":"20.04.6 LTS (Focal Fossa)"}},"metricset":{"name":"json","period":10000},"service":{"address":"http://unix/inputs","type":"http"}}, Private:interface {}(nil), TimeSeries:true}, Flags:0x0, Cache:publisher.EventCache{m:mapstr.M(nil)}} (status=400): {"type":"document_parsing_exception","reason":"[1:2454] failed to parse field [filebeat_input.httpjson_interval_pages_total] of type [long] in document with id '55pe_ogB6FfuXHBPI22x'. Preview of field's value: '{histogram={p99=0, min=0, median=0, max=0, mean=0, count=0, p999=0, stddev=0, p95=0, p75=0}}'","caused_by":{"type":"illegal_argument_exception","reason":"Cannot parse object as number"}}, dropping event!

Looking at the mappings, looks like it's matching the *_total pattern:

      {
        "filebeat_input.*_total": {
          "path_match": "filebeat_input.*_total",
          "mapping": {
            "type": "long"
          }
        }
      },

Unsure if this is best fixed by updating metric name in the input to match the existing rule or update the existing dynamic in the mappings, so I created this issue in Beats as a starting point.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@ebeahan
Copy link
Member Author

ebeahan commented Jun 27, 2023

@marc-gr would you take a look since you worked on #35392?

@efd6
Copy link
Contributor

efd6 commented Jul 3, 2023

@ebeahan ISTM that the histogram is not a total, so a reasonable approach would be to just drop the _total, or s/_total/_count/.

@andrewkroh
Copy link
Member

andrewkroh commented Jul 3, 2023

I agree with changing the name.

I am also wondering if there is some issue with the generated index template mapping order. ISTM that the *.histogram.* path match rule should have taken precedence over the *_total rule. I thought the first matching template wins. We should look at the Fleet generated component template instead of just this definition from the package fields.yml.

https://github.com/elastic/integrations/blob/557ecfd1ed6474e1cec653e1911008821df230ac/packages/elastic_agent/data_stream/filebeat_input_metrics/fields/fields.yml#L19-L28

@ebeahan
Copy link
Member Author

ebeahan commented Jul 5, 2023

I am also wondering if there is some issue with the generated index template mapping order. ISTM that the .histogram. path match rule should have taken precedence over the *_total rule. I thought the first matching template wins. We should look at the Fleet generated component template instead of just this definition from the package fields.yml.

Looking at the generated mapping, it looks like the rules rely on naming conventions and path_match parameters:

      "dynamic_templates": [
...
        {
          "filebeat_input.*.histogram.count": {
            "path_match": "filebeat_input.*.histogram.count",
            "mapping": {
              "type": "long"
            }
          }
        },
        {
          "filebeat_input.*.histogram.*": {
            "path_match": "filebeat_input.*.histogram.*",
            "mapping": {
              "type": "double"
            }
          }
        },
        {
          "filebeat_input.*_total": {
            "path_match": "filebeat_input.*_total",
            "mapping": {
              "type": "long"
            }
          }
        },
...
      ]

@marc-gr marc-gr self-assigned this Jul 6, 2023
@andrewkroh
Copy link
Member

andrewkroh commented Jul 27, 2023

Maybe we could add an ingest node pipeline to the elastic-agent package to fix this for users that are already on 8.9.0? Just an idea.

@andrewkroh
Copy link
Member

andrewkroh commented Jul 28, 2023

Another solution would be to insert path_unmatch into the mapping (not sure if this is possible in Fleet, let's research this).

        {
          "filebeat_input.*_total": {
            "path_match": "filebeat_input.*_total",
            "path_unmatch": "*.histogram.*",
            "mapping": {
              "type": "long"
            }
          }
        }

PR: elastic/integrations#7178 (abandoned)

andrewkroh added a commit to andrewkroh/integrations that referenced this issue Jul 28, 2023
Explicitly specify not to match `*.histogram.*` paths.

Fixes elastic/beats#35933
andrewkroh added a commit to andrewkroh/integrations that referenced this issue Jul 28, 2023
Explicitly specify not to match `*.histogram.*` paths.

Fixes elastic/beats#35933
@andrewkroh
Copy link
Member

andrewkroh commented Jul 28, 2023

One more solution would be to narrow the matching type on filebeat_input.*_total. Where match_mapping_type: "*" is used we could change it to match_mapping_type: "long" then it would never match the histogram value because it is an object.

PR: elastic/integrations#7179

andrewkroh added a commit to andrewkroh/integrations that referenced this issue Jul 28, 2023
Avoid wildcard value for `object_type_mapping_type` so that objects are not
matched. The problem was that `filebeat_input.httpjson_interval_pages_total`,
which holds an object, was being matched by the dynamic mapping specified for
`filebeat_input.*_total`. This made it a long which led to a mapping exception.

Fixes elastic/beats#35933
@andrewkroh
Copy link
Member

andrewkroh commented Jul 28, 2023

The solution here is two pronged:

Upgrading the elastic_agent integration version is sufficient to fix the problem.

andrewkroh added a commit to elastic/integrations that referenced this issue Jul 28, 2023
Avoid wildcard value for `object_type_mapping_type` so that objects are not
matched. The problem was that `filebeat_input.httpjson_interval_pages_total`,
which holds an object, was being matched by the dynamic mapping specified for
`filebeat_input.*_total`. This made it a long which led to a mapping exception.

Fixes elastic/beats#35933
@lucabelluccini
Copy link
Contributor

Hello @andrewkroh do we need to create a known issue for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants