Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily import hard-coded list of common fields #1018

Closed
3 tasks done
jsoriano opened this issue Oct 26, 2022 · 14 comments · Fixed by #1090
Closed
3 tasks done

Temporarily import hard-coded list of common fields #1018

jsoriano opened this issue Oct 26, 2022 · 14 comments · Fixed by #1090
Assignees
Labels
Team:Ecosystem Label for the Packages Ecosystem team

Comments

@jsoriano
Copy link
Member

jsoriano commented Oct 26, 2022

This issue is a temporary workaround for elastic/package-spec#441 and #1017, discard it if the other issues are prioritized first.

For packages importing fields from ECS, import a hard-coded list of fields that are commonly used and can be a source of conflicts if they don't have any mapping.

Update: instead of adding specific fields, include dynamic mappings as described in #1018 (comment). We have to try how to add the dynamic mappings:

Fields imported this way should be imported in a new well-known fields file that can be discarded later by Fleet if elastic/kibana#144048 is implemented. For example they could be included in a file called fields/imported_elastic_package.yml.

Fixes elastic/integrations#4236.

@P1llus
Copy link
Member

P1llus commented Oct 27, 2022

Don't think I forgot anything here, but feel free to double check @andrewkroh @jsoriano .
We also have a few processors adding non-ecs fields, a bit unsure what we want to do with those, but added them to the list.

Agent:

- external: ecs
  name: agent.name
- external: ecs
  name: agent.type
- external: ecs
  name: agent.id
- external: ecs
  name: agent.ephemeral_id
- external: ecs
  name: agent.version

Host:

- external: ecs
  name: host.architecture
- external: ecs
  name: host.domain
- external: ecs
  name: host.hostname
- external: ecs
  name: host.id
- external: ecs
  name: host.ip
- external: ecs
  name: host.mac
- external: ecs
  name: host.name
- external: ecs
  name: host.os.family
- external: ecs
  name: host.os.kernel
- external: ecs
  name: host.os.name
- external: ecs
  name: host.os.platform
- external: ecs
  name: host.os.version
- external: ecs
  name: host.os.build
- external: ecs
  name: host.os.codename
- external: ecs
  name: host.os.type
- external: ecs
  name: host.text
- external: ecs
  name: host.type
- external: ecs
  name: host.containerized

Geo:

- external: ecs
  name: geo.continent_name
- external: ecs
  name: geo.country_iso_code
- external: ecs
  name: geo.region_name
- external: ecs
  name: geo.region_iso_code
- external: ecs
  name: geo.city_name
- external: ecs
  name: geo.name
- external: ecs
  name: geo.location

Cloud:

- external: ecs
  name: cloud.account.id
- external: ecs
  name: cloud.availability_zone
- external: ecs
  name: cloud.instance.id
- external: ecs
  name: cloud.instance.name
- external: ecs
  name: cloud.machine.type
- external: ecs
  name: cloud.provider
- external: ecs
  name: cloud.region
- external: ecs
  name: cloud.project.id
- external: ecs
  name: cloud.image.id

Docker (NON ECS FIELDS):

    - name: docker
      default_field: true
      type: group
      fields:
        - name: container.id
          type: alias
          path: container.id
          migration: true

        - name: container.image
          type: alias
          path: container.image.name
          migration: true

        - name: container.name
          type: alias
          path: container.name
          migration: true

Kubernetes (NON ECS FIELDS):

    - name: kubernetes
      default_field: true
      type: group
      fields:
        - name: pod.name
          type: keyword
          description: >
            Kubernetes pod name

        - name: pod.uid
          type: keyword
          description: >
            Kubernetes Pod UID

        - name: pod.ip
          type: ip
          description: >
            Kubernetes Pod IP

        - name: namespace
          type: keyword
          description: >
            Kubernetes namespace

        - name: node.name
          type: keyword
          description: >
            Kubernetes node name

        - name: node.hostname
          type: keyword
          description: >
            Kubernetes hostname as reported by the node’s kernel

        - name: labels.*
          type: object
          object_type: keyword
          object_type_mapping_type: "*"
          description: >
            Kubernetes labels map

        - name: annotations.*
          type: object
          object_type: keyword
          object_type_mapping_type: "*"
          description: >
            Kubernetes annotations map

        - name: selectors.*
          type: object
          object_type: keyword
          object_type_mapping_type: "*"
          description: >
            Kubernetes selectors map

        - name: replicaset.name
          type: keyword
          description: >
            Kubernetes replicaset name

        - name: deployment.name
          type: keyword
          description: >
            Kubernetes deployment name

        - name: statefulset.name
          type: keyword
          description: >
            Kubernetes statefulset name

        - name: container.name
          type: keyword
          description: >
            Kubernetes container name (different than the name from the runtime)

Process:

    - name: process
      default_field: true
      type: group
      fields:
        - name: exe
          type: alias
          path: process.executable
          migration: true
        - name: owner
          type: group
          description: Process owner information.
          fields:
            - name: id
              type: keyword
              ignore_above: 1024
              description: Unique identifier of the user.
            - name: name
              type: keyword
              ignore_above: 1024
              multi_fields:
              - name: text
                type: text
                norms: false
              description: Short name or login of the user.
              example: albert

@andrewkroh
Copy link
Member

What produces those "Geo" fields? ^geo.* are not valid ECS fields as per https://www.elastic.co/guide/en/ecs/current/ecs-geo.html#_field_reuse_10.

In general I think it would be good to call out specifically what produces each set of fields. For example, say they are produced by add_kubernetes_metadata and link to its documentation.

IIRC there are also some elastic_agent.* fields that are injected into each input's config by the Agent.

@jsoriano
Copy link
Member Author

jsoriano commented Nov 7, 2022

@P1llus thanks for collecting the list of fields, I think this will be useful for elastic/package-spec#441.

For the matter of this workaround I was thinking on a reduced list of fields, whose lack of mappings is currently causing known issues. I wouldn't like to hard-code many fields here. If we cannot benefit of a small list of hard-coded fields, I would wait to have elastic/package-spec#441 and #1017.

Would you have a list of fields currently causing issues?

@P1llus
Copy link
Member

P1llus commented Nov 8, 2022

@jsoriano the issue is mostly regarding any ECS field that processors add, because it will conflict with fields installed by other integrations.
The best possible workaround right now would simply be to update the custom input packages field definition with these ECS fields, that way we wont have to wait for the other more permanent fixes.

WDYT @jsoriano @andrewkroh ?

@jsoriano
Copy link
Member Author

jsoriano commented Nov 8, 2022

the issue is mostly regarding any ECS field that processors add, because it will conflict with fields installed by other integrations.

Yes, but I wouldn't like to hard-code it in packages or in elastic-package, this is information that is not actually related to integrations. As mentioned, I would accept to hard-code a reduced set of fields if this is a low-hanging fruit to solve actual fixes that we have now.

The best possible workaround right now would simply be to update the custom input packages field definition with these ECS fields, that way we wont have to wait for the other more permanent fixes.

Why input packages and not all packages? Documents of any input or integration can be enriched with processors.

Hard-coding many fields has some problems:

  • We have to decide what to do with all the duplicated fields that will appear now (Mario already faced this in Import hard-coded list of common fields #1025).
  • When we have a more maintainable long-term solution, we will have to revert all these hard-codings and it may be difficult to track all the implications if many fields were added.

@jsoriano
Copy link
Member Author

jsoriano commented Dec 7, 2022

We have been discussing about this, and we are going to include a set of hard-coded mappings that will include all ECS definitions. Longer term these mappings will be included by Fleet or Elasticsearch (see elastic/elasticsearch#85692), but for packages supporting current and old versions of the stack, we will continue using the method implemented here.

We have to create a POC implementation in elastic-package, and decide if we need a change in the package-spec for this. We want this to be opt-in per package, and these dynamic mappings would be included in build time, so we probably need to add a setting under _dev/build.

Dynamic template to include can be found in https://gist.github.com/P1llus/e0de7b3a7824a41a29660e253c6cce6b

Thanks @P1llus!

@mrodm
Copy link
Contributor

mrodm commented Dec 16, 2022

@P1llus we've been checking this issue in depth and it looks like that for input packages is not going to work. This issue is pending for input packages elastic/kibana#145529

Would that be a problem ? Is it ok to just make it working for integration packages ?

@P1llus
Copy link
Member

P1llus commented Dec 16, 2022

@P1llus we've been checking this issue in depth and it looks like that for input packages is not going to work. This issue is pending for input packages elastic/kibana#145529

Would that be a problem ? Is it ok to just make it working for integration packages ?

@mrodm are you talking about dynamic templates or? The input packages we already have seems to not have any problems installing things like index mappings etc?

@mrodm
Copy link
Contributor

mrodm commented Dec 16, 2022

@P1llus we've been checking this issue in depth and it looks like that for input packages is not going to work. This issue is pending for input packages elastic/kibana#145529
Would that be a problem ? Is it ok to just make it working for integration packages ?

@mrodm are you talking about dynamic templates or? The input packages we already have seems to not have any problems installing things like index mappings etc?

@P1llus Yes, I was referring to using dynamic templates. Our first approach was to follow this (#1018 (comment)) to be able to cover more Elastic stack versions:

If we add the dynamic mappings in the manifest, we will cover more versions of the stack, but we have to be careful of merging correctly with other existing configurations, and it may be difficult to check for documented fields in tests.

However, currently input packages do not support that. So, it would be available for integration packages first, and input packages would depend on elastic/kibana#145529.

@jsoriano
Copy link
Member Author

The input packages we already have seems to not have any problems installing things like index mappings etc?

@P1llus I think Mario is referring to packages with type input and you are probably referring to the current integration packages that we consider "input packages" by convention, such as the current log package. Did you have the chance of trying to create a package with type input?

@P1llus
Copy link
Member

P1llus commented Dec 19, 2022

@mrodm @jsoriano
Ah that is a shame, I was really hoping we could use the dynamic templates for older versions as well :(

At least we will be able to configure this for current integrations and have it work for older versions, when it comes to input type packages I don't really have any preference, I am unaware how we are going to get users to move from the current implementation as normal packages, and over to these new input specific ones.

Would it be okay if we add dynamic templates to the current "custom input" packages, that is not of type input?

Of course we won't be actually adding any dynamic templates yet, we still need to test more after the feature has been added in.

@jsoriano
Copy link
Member Author

Would it be okay if we add dynamic templates to the current "custom input" packages, that is not of type input?

Yes, this would work.

@mrodm
Copy link
Contributor

mrodm commented Jan 9, 2023

Related to this bullet:

Add imported templates as definitions to be used in tests

When the dynamic templates are added as part of the field definitions, there are some issues.

For instance, there is a dynamic template for match: location between the ones included in the PR.

      {
        "location_to_geo_point": {
          "match": "location",
          "mapping": {
            "type": "geo_point"
          }
        }
      },

This template should match any full path containing location on it, but when trying to add a field definition in elastic-package this is not possible. In elastic-package , field definitions (name field), requires to have the full path for the field or using
wildcards between dots.

To start testing this feature, this package "elastic-package/test/packages/other/fields_tests" has been updated to include/import dynamic templates and it has been removed the geo-fields in the data_stream named first.
When doing so, there are some failures when running elastic-package test -v:

FAILURE DETAILS:
fields_tests/first Verify sample_event.json:
[0] field "source.geo.location.lat" is undefined
[1] field "source.geo.location.lon" is undefined
[2] field "destination.geo.location.lat" is undefined
[3] field "destination.geo.location.lon" is undefined

For those fields, this would be the field definition created in elastic-package during tests from the dynamic templates:

- name: location
  description: ""
  type: geo_point
  object_type: ""
  value: ""
  allowed_values: []
  expected_values: []
  pattern: ""
  unit: ""
  metric_type: ""
  external: ecs
  index: null
  doc_values: null

According to how elastic-package checks for fields, field definitions should be using full paths or be using wildcards for each word between dots something like: *.geo.location. Something like would be an option:

      {
        "location_to_geo_point": {
          "match": "location",
          "mapping": {
            "type": "geo_point"
          }
        }
      },

With these current dynamic template definitions, I think fields should be kept in the packages too, @jsoriano WDYT ?

@jsoriano
Copy link
Member Author

jsoriano commented Jan 9, 2023

With these current dynamic template definitions, I think fields should be kept in the packages too, @jsoriano WDYT ?

As discussed over zoom, let's try to import the whole ECS for fields validation as we import external fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Ecosystem Label for the Packages Ecosystem team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants