Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change text fields to keyword for Metricbeat #10318

Merged
merged 6 commits into from
Jan 29, 2019

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented Jan 24, 2019

Searching for type: text revealed a few fields which were text but I think should be keyword.

@ruflin ruflin added module review Metricbeat Metricbeat Team:Integrations Label for the Integrations team labels Jan 24, 2019
@ruflin ruflin self-assigned this Jan 24, 2019
@ruflin ruflin requested a review from webmat January 24, 2019 14:03
@ruflin ruflin requested review from a team as code owners January 24, 2019 14:03
@@ -40,7 +40,7 @@
description: consumer offset into partition being read

- name: meta
type: text
type: keyword
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso objections?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If text indexing is needed, it should be added as a multi-field at .meta.text

@@ -9,7 +9,7 @@
description: >
Kibana instance UUID
- name: name
type: text
type: keyword
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator Ok? Don't see why this should be text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense as keyword to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Contributor

@ycombinator ycombinator Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I screwed up. This should remain text IMHO. See my more-enlightened comment here: https://github.com/elastic/beats/pull/10318/files#r250673640

@@ -34,7 +34,7 @@
description: >
The request method (GET, POST, etc) (of the current request)
- name: request_uri
type: text
type: keyword
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat should we map this to url.original?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if Ceph's children is made into a keyword as well.

Is this search limited to Metricbeat? Or did you search everywhere, and only Metricbeat had text field left?

@@ -9,7 +9,7 @@
description: >
Kibana instance UUID
- name: name
type: text
type: keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@@ -34,7 +34,7 @@
description: >
The request method (GET, POST, etc) (of the current request)
- name: request_uri
type: text
type: keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely

@@ -40,7 +40,7 @@
description: consumer offset into partition being read

- name: meta
type: text
type: keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If text indexing is needed, it should be added as a multi-field at .meta.text

metricbeat/module/ceph/osd_tree/_meta/fields.yml Outdated Show resolved Hide resolved
@webmat
Copy link
Contributor

webmat commented Jan 24, 2019

Got an answer to my question via the tag. We should do the same for all beats. Adding a note to #8655, in the section "Fields changes"

@ruflin ruflin mentioned this pull request Jan 24, 2019
@webmat webmat changed the title Change text fields to keyword Change text fields to keyword for Metricbeat Jan 24, 2019
@@ -10640,7 +10640,7 @@ Kibana instance UUID
*`kibana.stats.name`*::
+
--
type: text
type: keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be text. This value comes from the server.name setting in kibana.yml. The setting is meant for display purposes, according to it's documentation:

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

So it could be a string like "Marketing's Fantastic Analytics UI". In that case I can see the benefit of letting ES analyze the string. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If searching for specific words (e.g. "marketing", to follow your example), then we simply need server.name.text for these searches.

I agree it's really useful in situations where these resources are not always well tagged & so on.

But the canonical field -- server.name -- should really be keyword. Defaulting to keyword all the time, then only adding text as a multi-field when needed ensures this progression over time (adding text) is never a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me but I have a tangential question:

IIRC if the server.name field is mapped as keyword in ES, it wouldn't also get a server.name.text multi-field automatically. Wouldn't we need to explicitly specify that somehow in the fields.yml so ES knows to create the multi-field? Or do all keyword fields in fields.yml automatically get a multi-field mapping added to them in the template via dynamic templates or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point: having server.name as keyword will enable users to aggregate per kibana server. It's the reason in ECS, we've decided to push for keyword across the board, by default

Copy link
Contributor

@webmat webmat Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I missed your answer when posting my previous message.

The template will have to create the multi-field explicitly for the .text.

Going for keyword across the board is also a small performance gain, so the default is kw only. Then we add text strategically, where it makes sense.

I'm not sure the exact incantation to make the multi-field in Beats fields.yml.

Copy link
Contributor

@ycombinator ycombinator Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I understand now. Thanks. I'm good with changing this to keyword for now and adding a text multi-field later if we need.

@ruflin
Copy link
Contributor Author

ruflin commented Jan 25, 2019

@webmat For the text fields I focused on Metricbeat, but we should probably to a broader search.

@@ -18573,7 +18573,7 @@ The request method (GET, POST, etc) (of the current request)
*`php_fpm.process.request_uri`*::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat Sounds like a field that should be mapped to ECS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. url.original?

I noticed a few more things in this field defs:

  • user => user.name
  • script => file.path
  • content_length => http.request.body.size
  • request_method => http.request.method
  • request_duration => event.duration, with an adjustment to nanos
  • pid => process.pid

No longer sure if any of those were discussed in #10218

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we should review php_fpm in detail again. Can you comment the above on #10218 so we can track it there?

@ruflin
Copy link
Contributor Author

ruflin commented Jan 25, 2019

This should be ready for review.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ceph children field still looks like it's text. Other than that, looking good :-)

Note that I trust your grepping skill. I haven't double checked whether you had missed any.

metricbeat/module/ceph/osd_tree/_meta/fields.yml Outdated Show resolved Hide resolved
metricbeat/module/ceph/osd_tree/_meta/fields.yml Outdated Show resolved Hide resolved
@@ -86,7 +86,7 @@ func eventsMapping(content []byte) ([]common.MapStr, error) {
nodeInfo := common.MapStr{}
if node.ID < 0 {
//bucket node
nodeInfo["children"] = childrenMap[node.Name]
nodeInfo["children"] = strings.Split(childrenMap[node.Name], ",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Searching for `type: text` revealed a few fields which were text but I think should be keyword.
@ruflin ruflin merged commit 0f3ea0d into elastic:master Jan 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metricbeat Metricbeat module review Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants