Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Wildcard - stage 3 proposal #1015

Merged
merged 22 commits into from
Nov 11, 2020
Merged

Conversation

ebeahan
Copy link
Member

@ebeahan ebeahan commented Oct 6, 2020

Summary

Revisions to the stage two wildcard RFC for consideration of advancement for stage three.

Opening this PR with minimal changes. Feedback and concerns as the wildcard changes are adopted can be captured in this PR and incorporated into the proposal as appropriate.

Criteria for consideration

  • Opened pull request for this candidate revising the existing draft
  • Completed field definitions
  • Included multiple real world example source documents
  • Existing or newly raised questions and concerns are addressed

Markdown preview of this RFC

@ebeahan ebeahan added the RFC label Oct 6, 2020
@ebeahan ebeahan self-assigned this Oct 6, 2020
@rw-access
Copy link
Contributor

I saw a few typos. These say "leading wildcard" but the example is a trailing wildcard: foo*

| Leading wildcard queries on low-cardinality fields (foo*) | Fast | Slower (see *3) |
| Leading wildcard queries on high-cardinality fields (foo* ) | Terrible | Much faster |

Inconsistent escaping, should do \\ for all escaped backslashes.

* Match under registry path: `registry.path:\\HKLM\\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\*\Debugger`

Similarly, I think these should be escaped with \ not /. I don't think this is valid syntax:

process.command_line:*f/ foo* and process.command_line:*/b bar*

Worth double checking if this is valid syntax. I'm not 100% sure if \ can escape whitespace.
process.command_line:*f\ foo* and process.command_line:*/b\ bar*

@webmat
Copy link
Contributor

webmat commented Oct 8, 2020

We should remove user.domain, in line with not including host.domain

@webmat
Copy link
Contributor

webmat commented Oct 14, 2020

We could add a link to the wildcard docs in the references https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html#wildcard-field-type

@webmat
Copy link
Contributor

webmat commented Oct 21, 2020

Whoa, I just came across this process.thread.name:

"process.thread.name":"TC: 14:05:53 Stage: Initializing Startup subsystem; TeamCity Server (version 2020.1.4 build 78906 started 2020-10-21 14:05:53.422), node id: MAIN_SERVER"

And here I thought people were as imaginative as me, calling their threads "thread1", "thread2", "worker1" and so on.

I think we should make this field wildcard as well.

@ebeahan
Copy link
Member Author

ebeahan commented Oct 21, 2020

@webmat - I, as well, must not be very imaginative 😂

I've captured process.thread.name in the list.

@webmat
Copy link
Contributor

webmat commented Oct 22, 2020

Just like I mentioned in today's meeting, I think we should make obvious that this RFC is solely about the migration of pre-existing keyword fields to wildcard.

Using wildcard right away on net new fields is totally fine, IMO. WDYT?

@ebeahan
Copy link
Member Author

ebeahan commented Oct 22, 2020

Yes agreed! I've actually made the suggestion to consider wildcard for net new fields in other proposals, so I'm glad we're aligned 😉 😂

I will adjust the RFC's messaging to make it clear this RFC is documenting the initial proposal focused on existing fields, but any net new fields should consider using wildcard whenever appropriate.

@ebeahan
Copy link
Member Author

ebeahan commented Oct 22, 2020

I made some adjustments to the title and intro trying to better focus on the migration over wildcard's general presence in ECS: 3f2eaaa (#1015)

I'm happy to flesh this out with more detail if that feels appropriate.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted a few things around the recent changes 👍

rfcs/text/0001-wildcard-data-type.md Outdated Show resolved Hide resolved
rfcs/text/0001-wildcard-data-type.md Show resolved Hide resolved
rfcs/text/0001-wildcard-data-type.md Outdated Show resolved Hide resolved
rfcs/text/0001-wildcard-data-type.md Outdated Show resolved Hide resolved
rfcs/text/0001-wildcard-data-type.md Outdated Show resolved Hide resolved
@ebeahan
Copy link
Member Author

ebeahan commented Nov 6, 2020

PR to apply wildcard migration changes in progress #1098

@ebeahan
Copy link
Member Author

ebeahan commented Nov 7, 2020

Including event.original as a candidate field for wildcard may need reconsideration.

The event.original field in ECS is currently keyword with indexing disabled. Valid concerns were raised (#970 (comment)) to continue disabling indexing on event.original to avoid any significant impact on indexing. Some data sources will produce original events that are very, very large.

Wildcard is part of the keyword family. Unlike other keyword data types, it doesn't support disabling either indexing or doc values by design. Trying to set either option to false will be rejected by the API:

PUT wildcard-index
{
  "mappings": {
    "properties": {
      "wildcard": {
        "type": "wildcard",
        "index": false
      }
    }
  }
}

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "The field [wildcard] cannot have index = false"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "Failed to parse mapping [_doc]: The field [wildcard] cannot have index = false",
    "caused_by" : {
      "type" : "mapper_parsing_exception",
      "reason" : "The field [wildcard] cannot have index = false"
    }
  },
  "status" : 400
}

A simple path may be to omit event.original as a candidate for wildcard and have it remain keyword with index: false.

@webmat @leehinman Any thoughts?

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good on the current list of fields to migrate. This LGTM other than a few small adjustments.

Nitpick: we're missing organization.name in the table :-)

I'm good with reverting event.original to keyword with index: false for now, as the way forward. We could adjust the field definition in #1098 to say it's currently not indexed, and that if users want to override this and make it indexed, they should consider wildcard.

rfcs/text/0001-wildcard-data-type.md Outdated Show resolved Hide resolved
rfcs/text/0001-wildcard-data-type.md Outdated Show resolved Hide resolved
rfcs/text/0001-wildcard-data-type.md Outdated Show resolved Hide resolved
ebeahan and others added 6 commits November 10, 2020 09:20
Co-authored-by: Mathieu Martin <webmat@gmail.com>
Co-authored-by: Mathieu Martin <webmat@gmail.com>
Co-authored-by: Mathieu Martin <webmat@gmail.com>
@ebeahan
Copy link
Member Author

ebeahan commented Nov 10, 2020

Latest round of changes:

  • Added missing organization.name to the fields table
  • Removed event.original as a wildcard candidate field
  • Corrected formatting and incorporated peer suggestions.

webmat
webmat previously approved these changes Nov 10, 2020
Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, let's change that date and merge 👍

RFC 0001 is also the first to reach stage 3 🎉 😃

@ebeahan ebeahan merged commit 124dbfb into elastic:master Nov 11, 2020
@ebeahan ebeahan deleted the wildcard-rfc-stage-3 branch November 11, 2020 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants