-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Wildcard - stage 2 proposal #970
Conversation
@@ -0,0 +1,13 @@ | |||
--- | |||
- name: process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these apply to process.parent
asa well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now that process.parent
is managed by the field reuse mechanism, this will indeed apply to it :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new table. I have a few nitpicks as comments below, but the meat of the review will be here.
- I agree with
source
,destination
,file
,os
,registry
anduser_agent
fields you've selected - I agree with the
url
fields you've selected so far - I have a few challenges on some of the other fields selected, noted below. But mostly good selection there as well, overall 👍
Here are a few more fields I think we should migrate to wildcard now, and have reflected in the RFC's YAML files, for experimental release in 1.7.0:
- Changes to source & destination should be mirrored to client & server as well. So
client.domain
,client.registered_domain
and same underserver.*
-
agent.build.original
-
error.type
-
event.original
- add
index: true
in the RFC YAML file as well. Right now it's index false in ECS.
- add
-
http.request.referrer
-
log.file.path
,log.logger
-
organization.name
-
as.organization.name
(this is not a reuse of theorganization
field set, it's defined explicitly) -
tls.client.issuer
,tls.client.subject
,tls.server.issuer
,tls.server.subject
-
x509.issuer.distinguished_name
,x509.subject.distinguished_name
-
url.path
Here are a few I would like to consider to migrate in the future. I'd say capture those just in markdown for now:
-
geo.name
-
registry.data.strings
-
pe.product
-
dns.question.name
,dns.answers.data
Here are two fields I think we should migrate at 8.0, because they represent a breaking change. Therefore let's capture them in the RFC text, but not in the YAML files.
-
message
-
error.message
Questions / challenges
- Why migrate
agent.name
? I doubt this field is widely used. When it is used, I doubt the cardinality is very high. - I would remove
host.name
from the list for now. It's the main identifier of a host for Elastic Security, so lots of aggregations and filtering aroundhost.name
would become a bit slower if we migrate it. However I think it's fine to leavehost.hostname
aswildcard
. - I would also not migrate
host.domain
as in this case, it's rarely going to be a fqdn, but rather an AD domain name. Moreover, this is not going to be suspicious data where users need to do wildcard searches, IMO. - Why
process.name
? It's a short executable name in Posix envs. Is it pretty long under Windows? (The long process title should be captured atprocess.title
) - I would remove
user.domain
for the same reason ashost.domain
Maybe we can simply capture the "considered" and the "8.0" suggestions I'm making via an additional column in the table for now.
@rw-access Do you think |
@dainperkins Do you think |
pe.product - probably not. |
@rw-access Yeah .path and .key are already part of the plan. We'll add |
Agree with all the above 👍
Is the thinking to let these fields and their usage mature more and revisit
A separate conversation, but would we add a
I'm fine with removing it since it's not widely used.
👍
Fair. Even in fairly large AD environments, it's probably unlikely to have more than 100s or perhaps low 1000s of unique domains.
Not extremely long. From some I think I keyed off this one due to the
++ |
under consideration
No it's purely because I wasn't 100% sure myself. So I wanted to put them on the table for consideration without slowing things down. With Ross' feedback above, we can remove So this leaves only text to wildcard
For process.name Ok this could make sense, but I'm still hesitant. Perhaps we add it to the list of fields under consideration for now? |
I'm onboard with that approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for grabbing my commits @ebeahan. Here's a few more notes:
- When building artifacts with
build/ve/bin/python scripts/generator.py --include rfcs/text/0001/
, I see the field definition fordns.answers
disappearing from the csv, beats/fields.ecs.yml, ecs_flat.yml and ecs_nested.yml. A problem we should tackle separately from this RFC. - Note that I didn't end up adding anything as "considered", for now. I've directly added both DNS fields based on another discussion out of band. Since this only left
geo.name
to be considered, I've directly migrated it instead. Stage 2 is experimental, so we can walk this back if this is a problem. - I see you've removed
process.name
since I cast doubt on it. But looking back at further .text and keyword discussion #570 I see Craig was pushing for more flexibility for that field. Let's ignore my doubts and go with your initial gut feeling, and migrate it to wildcard 😬 - In line with the point above, and with another request from Craig on further .text and keyword discussion #570, let's also move
pe.original_file_name
to wildcard
I think we'll be good after this.
Overall LGTM One small question/worry. |
Good point @leehinman, thanks. Right now ECS doesn't deal very well with non indexed fields. Currently I think we could leave it non-indexed but still "migrate" it to wildcard. Separately from this RFC I think we could improve on these non indexed fields, and say something to the tone of "this field is not indexed, but if users want to index it anyway, we suggest type: wildcard" |
4cdb03b
to
7a4a3e9
Compare
I wasn't seeing the same behavior in a quick test to reproduce. I'll try some more, and if I can reliable reproduce I'll open an issue for later.
Sounds good. 👍 I addressed the two items @webmat listed and also added One outstanding question: From @leehinman's observation, should we edit: https://github.com/ebeahan/ecs/blob/wildcard-rfc-stage-2/rfcs/text/0001/event.yml#L5 and remove the |
@ebeahan Yes for now, let's remove However we will address the rendering of this subtlety in the docs separately. |
Good 👁️ on adding |
Stumbled on a typo. Line 158 "Luence" => "Lucene" |
Co-authored-by: Mathieu Martin <webmat@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚢
Thanks everyone for the input!
Summary
Revisions to the stage 1 wildcard adoption for consideration to be accepted as a stage 2 proposal.
The most significant addition is the list of fields which are the current candidates for
wildcard
. Discussion on any fields that should be added or removed is welcomed.Criteria for consideration
Markdown preview of this RFC