Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest fields as keywords #10518

Closed
wants to merge 3 commits into from
Closed

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented Feb 4, 2019

Most field in ECS are keyword and made optional .text if needed. Search for type: text revealed a few fields which we should potentially convert.

Most field in ECS are keyword and made optional .text if needed. Search for type: text revealed a few fields which we should potentially convert.
@ruflin ruflin added in progress Pull request is currently in progress. discuss Issue needs further discussion. ecs labels Feb 4, 2019
@ruflin ruflin requested review from webmat, kvch and andrewkroh February 4, 2019 12:00
@ruflin ruflin requested review from a team as code owners February 4, 2019 12:00
@@ -175,6 +175,7 @@
something like `GET /users/_search?name=test`. For MySQL, it is
something like `SELECT id from users where name=test`.

#keyword(s)?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkroh Could you take a look at the fields below. Potentially these were discussed in previous PR's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. These should both be set to keyword, to enable aggregations and fast indexing.

If full text search is deemed necessary by us or customers, text should be added as a multi-field.

@ruflin ruflin requested a review from cwurm February 4, 2019 12:02
@@ -24,6 +24,7 @@
type: keyword
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwurm you might know more here.

Copy link
Contributor

@cwurm cwurm Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean user.user_information? I don't have a strong opinion I think. It's filled with strings like systemd Network Management,,,, Gnats Bug-Reporting System (admin). I don't expect too many users searching directly on it rather than something more unique like user.name.

Copy link
Contributor Author

@ruflin ruflin Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, not sure how the comment added up here :-(

Is the number of different strings limited? Do you expect users to group/aggregate/filter on these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in doubt, use keyword. text should be added as a multi-field if deemed necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text should be added as a multi-field if deemed necessary.

Can Beats generate multi-fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes:

    - name: thread
      type: keyword
      description: >
        Information about the running thread where the log originate.
      multi_fields:
        - name: text
          type: text

@@ -19,6 +19,7 @@
description: Set if the file has the `setgid` bit set. Omitted otherwise.

- name: origin
# Should this be keyword?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat @cwurm Thoughts? Also check below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file.origin is only filled on macOS. It can be a URL or a computer name, maybe more. We treat URLs as keywords in other places, so I think keyword would make sense here, too.

/cc @adriansr who wrote the code for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a keyword

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriansr Could you open a PR for this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings should always be keyword. text should be added as a multi-field if deemed necessary.

When in doubt, keyword

@ruflin ruflin mentioned this pull request Feb 4, 2019
@@ -829,6 +829,7 @@
type: keyword
description: This is the path associated with a unix socket.
- name: messages
# As it's the raw message should it be keyword. Should it even be indexed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's rarely used (it requires a config change to enable), but when it is used it's really useful to have as text so that you can search it.

Now that we have event.original, I'm thinking this would make sense to put there (even if it is a keyword). Likewise the warnings field below could go in as error.message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field should be keyword. If we want to enable full text search, text should be added as a multi-field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat If it's called message it should be text by default. So far this convention holds in ECS I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkroh I assume your team will tackle these changes?

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all entries should move to keyword, IMO. Only exception was DHCP error message.

@@ -19,6 +19,7 @@
description: Set if the file has the `setgid` bit set. Omitted otherwise.

- name: origin
# Should this be keyword?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings should always be keyword. text should be added as a multi-field if deemed necessary.

When in doubt, keyword

@@ -829,6 +829,7 @@
type: keyword
description: This is the path associated with a unix socket.
- name: messages
# As it's the raw message should it be keyword. Should it even be indexed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field should be keyword. If we want to enable full text search, text should be added as a multi-field.

filebeat/module/haproxy/_meta/fields.yml Outdated Show resolved Hide resolved
@@ -175,6 +175,7 @@
something like `GET /users/_search?name=test`. For MySQL, it is
something like `SELECT id from users where name=test`.

#keyword(s)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. These should both be set to keyword, to enable aggregations and fast indexing.

If full text search is deemed necessary by us or customers, text should be added as a multi-field.

@@ -186,6 +186,7 @@
IP address. In a server reply (DHCPOFFER), a DHCP server uses
this option to specify the lease time it is willing to offer.

# error.message? or just message?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I vote for error.message, although if there's no time, not the end of the world.

@@ -24,6 +24,7 @@
type: keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in doubt, use keyword. text should be added as a multi-field if deemed necessary.

@ruflin
Copy link
Contributor Author

ruflin commented Feb 5, 2019

Closing this PR as this will not be merged. @cwurm @andrewkroh I assume your team will follow up with these if needed. Even if it's closed, we can keep the discussion going.

@ruflin ruflin closed this Feb 5, 2019
@andrewkroh
Copy link
Member

I created a PR for these changes in #10577.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. ecs in progress Pull request is currently in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants