Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate fingerprinting in Prometheus TSDB input for Elastic 8.13 and above #9400

Closed
StephanErb opened this issue Mar 20, 2024 · 16 comments · Fixed by #9785
Closed

Eliminate fingerprinting in Prometheus TSDB input for Elastic 8.13 and above #9400

StephanErb opened this issue Mar 20, 2024 · 16 comments · Fixed by #9785
Assignees

Comments

@StephanErb
Copy link

The current Prometheus integration relies on a fingerprint dimension:

    - name: labels_fingerprint
      type: keyword
      dimension: true
      description: Autogenerated ID representing the fingerprint of labels object and includes query name

Now that elastic/elasticsearch#93564 is resolved in 8.13, this is no longer needed. Instead, labels.* can be marked as dimensions directly. Due to the locality sensitive hashing used in elastic/elasticsearch#98023 we can expect that elastic/elasticsearch#99747 will deliver storage savings over the current approach.

cc @felixbarny @tetianakravchenko

@felixbarny
Copy link
Member

I agree this is unblocked now und would lead to better performance and storage reduction. @lalit-satapathy do you think we can prioritize this?

@lalit-satapathy
Copy link
Collaborator

Hi @felixbarny,

Do we know if now objects can be marked dimension now ? Dont see them in the dimension doc.

@felixbarny
Copy link
Member

We don't even need that because we can just create a dynamic template for prometheus.labels.* that defines labels as a keyword field type with time_series_dimension set to true.

Something like this:

{
  "mappings": {
    "dynamic_templates": [
      {
        "labels": {
          "path_match": "prometheus.labels.*",
          "mapping": {
            "type": "keyword",
            "time_series_dimension": true
          }
        }
      }
    ]
  }
}

@tetianakravchenko
Copy link
Contributor

For the remote_write datastream: as I understand it will be still required to keep the fingerprint processor for the list of metric_names - https://github.com/elastic/integrations/blob/main/packages/prometheus/data_stream/remote_write/elasticsearch/ingest_pipeline/default.yml#L9-L24

but for collector - all ingest pipeline can be removed
and for query - fingerprint processor can be removed

@tetianakravchenko
Copy link
Contributor

We don't even need that because we can just create a dynamic template for prometheus.labels.* that defines labels as a keyword field type with time_series_dimension set to true.

hey @felixbarny can you please explain why we don't need it?

this is how mapping looks like now:
Screenshot 2024-03-26 at 11 40 18
for field definition:

    - name: labels.*
      type: object
      object_type: keyword
      description: |
        Prometheus metric labels

adding dimension: true to the definition above should produce the expected dynamic template, but it is not supported now. When using:

    - name: labels.*
      type: object
      object_type: keyword
      dimension: true
      description: |
        Prometheus metric labels

I am getting an error when building package:

elastic-package-0.98.2 build -v
...
   1. file ".../go/src/github.com/elastic/integrations/build/packages/prometheus-1.16.0.zip/data_stream/collector/fields/fields.yml" is invalid: field "prometheus.labels.*" of type object can't be a dimension, allowed types for dimensions: constant_keyword, keyword, long, integer, short, byte, double, float, half_float, scaled_float, unsigned_long, ip

@felixbarny
Copy link
Member

For the remote_write datastream: as I understand it will be still required to keep the fingerprint processor for the list of metric_names

That's right. But we could remove the fingerprinting of labels and only do a fingerprint of the metric names.

I am getting an error when building package:

Seems like there's a gap in the support in elastic-package and Fleet to support the dimension parameter for dynamic templates. But Elasticsearch and TSDB does support that. Could you create an issue for this? It seems like it should be an easy fix.

@lalit-satapathy
Copy link
Collaborator

Seems like there's a gap in the support in elastic-package and Fleet to support the dimension parameter for dynamic templates.

Yes, we need to mostly would need to go through the fleet enhancements for this. We did exactly the same for dynamic templates for metric types here.

@tetianakravchenko
Copy link
Contributor

fleet issue - elastic/kibana#179649

@felixbarny @lalit-satapathy can you please forward it to the correct team?

@jsoriano
Copy link
Member

jsoriano commented Apr 4, 2024

adding dimension: true to the definition above should produce the expected dynamic template, but it is not supported now. When using:

    - name: labels.*
      type: object
      object_type: keyword
      dimension: true
      description: |
        Prometheus metric labels

I am getting an error when building package:

elastic-package-0.98.2 build -v
...
   1. file ".../go/src/github.com/elastic/integrations/build/packages/prometheus-1.16.0.zip/data_stream/collect

@tetianakravchenko please try with the "compact" syntax:

    - name: labels.*
      type: keyword
      dimension: true
      description: |
        Prometheus metric labels

@felixbarny
Copy link
Member

Looks like all prerequisites for this issue are in place. Can we move forward with this now?

@lalit-satapathy
Copy link
Collaborator

Yes, we can take this up as the fleet issue is resolved. CC: @gpop63

@lalit-satapathy
Copy link
Collaborator

Since both testing by @gpop63 here and @harnish-elastic here has confirmed that the new capability only works 8.14 onwards, we should enable the same in packages stack dependancy.

FYI: The relevant Fleet change has gone in 8.14.0.

@lalit-satapathy
Copy link
Collaborator

@agithomas will be creating a separate meta issue to track other packages need such migration, which can be taken in the future. Please link the issue here.

@agithomas
Copy link
Contributor

@agithomas will be creating a separate meta issue to track other packages need such migration, which can be taken in the future. Please link the issue here.

@lalit-satapathy , please find the link to the meta issue

@gpop63
Copy link
Contributor

gpop63 commented Jun 20, 2024

@felixbarny we are fine with removing the labels_fingerprint fields from these data streams even if they are GA?

@felixbarny
Copy link
Member

As this is just an internal field that doesn't really provide value by itself, other than aiding the enablement of TSDB, I think that's fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants