Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Import ECS 1.0.0 Beta 1 field definitions #9014

Closed
wants to merge 16 commits into from

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Nov 9, 2018

This PR is a first attempt to re-import ECS field definitions, as of 1.0.0-beta1.

Notes

  • This PR is built on top of Rename source_ecs to source #8983
    • rebased
  • The beta 1 release of ECS came with two temporary field set removals. http and user_agent were taken out because of various concerns. This PR will add them to the common fields YML file in the meantime.

TODO

  • Copy/paste ecs fields.yml file as is
  • Introduce http to libbeat common fields (used by Fb and Pb)
  • Introduce user_agent to Filebeat common fields
  • Nest reuseable fields manually in the ecs file
    • geo
    • os
    • user
  • Fix test breakage
    • Filebeats integration tests
    • Doc generation complaints about http field set
      • Filebeat
      • Heartbeat
      • Metricbeat
    • Investigate local testsuite breakage
      • Works for PH and not on my machine. Retrying after deleting all Docker images
    • Suricata tests missing url.hostname
    • Suricata uses geo.region_iso_code, which is not in ECS yet (Add geo.region_iso_code ecs#177)
    • Failure in Metricbeat/Couchbase on Ubuntu
    • Failure in Filebeat on Darwin
    • ...

Discussion points

  • @ruflin two common fields were defined in the previous ECS fields.yml that are not in ECS right now: agent.hostname and host.name
    • Currently re-added them to libbeat/_meta/fields.common.yml, just to get integration tests to pass and see if there's other failures.
    • We can discuss later if we add to ECS or leave as Beats common fields.

@webmat webmat force-pushed the ecs_beta1_field_update branch from 19f099c to 32ce077 Compare November 9, 2018 17:46
@webmat webmat self-assigned this Nov 9, 2018
@webmat webmat added in progress Pull request is currently in progress. ecs labels Nov 9, 2018
@webmat webmat requested review from ruflin and andrewkroh November 9, 2018 20:07
@ruflin
Copy link
Contributor

ruflin commented Nov 11, 2018

Overall LGTM. It breaks the docs build. Seems to be related to the http fields? Perhaps they are now defined twice on the top level? (Just a guess)

@webmat webmat force-pushed the ecs_beta1_field_update branch from f7cdf08 to e303ae5 Compare November 12, 2018 20:10
@webmat
Copy link
Contributor Author

webmat commented Nov 12, 2018

@ruflin The doc generation no longer breaks. I don't like the fixes I had to do to get there, however. I'll finish writing a more cogent analysis about it tomorrow.

If you could take a look at the fixes I had to do in each of Hb, Mb and Pb, and tell me what you think, I'd love that. If you don't have time, no worries, I'll bring up the issue with people who are around tomorrow, and also formulate my analysis here.

packetbeat/protos/http/_meta/fields.yml Outdated Show resolved Hide resolved
description: >
City name.

- name: region_iso_code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it's not currently in ECS. I've created an issue for this: elastic/ecs#177

@webmat
Copy link
Contributor Author

webmat commented Nov 13, 2018

Most recent push breaks in a Metricbeat's test for Couchbase on Ubuntu:

14:21:02 ======================================================================
14:21:02 FAIL: couchbase metricsets tests [with metricset='bucket']
14:21:02 ----------------------------------------------------------------------
14:21:02 Traceback (most recent call last):
14:21:02   File "/go/src/github.com/elastic/beats/metricbeat/build/python-env/local/lib/python2.7/site-packages/parameterized/parameterized.py", line 392, in standalone_func
14:21:02     return func(*(a + p.args), **p.kwargs)
14:21:02   File "/go/src/github.com/elastic/beats/metricbeat/tests/system/test_couchbase.py", line 22, in test_couchbase
14:21:02     self.check_metricset("couchbase", metricset, self.get_hosts(), self.FIELDS)
14:21:02   File "/go/src/github.com/elastic/beats/metricbeat/tests/system/metricbeat.py", line 93, in check_metricset
14:21:02     self.assertItemsEqual(self.de_dot(fields), evt.keys())
14:21:02 AssertionError: Element counts were not equal:
14:21:02 First has 1, Second has 0:  'couchbase'
14:21:02 First has 0, Second has 1:  u'error'
14:21:02 -------------------- >> begin captured stdout << ---------------------
14:21:02 {u'@timestamp': u'2018-11-13T19:18:50.846Z', u'agent': {u'hostname': u'fd4fedbe0975', u'type': u'metricbeat', u'version': u'7.0.0-alpha1'}, u'host': {u'name': u'fd4fedbe0975'}, u'error': {u'message': u'HTTP error 401 in bucket: 401 Unauthorized'}, u'metricset': {u'rtt': 10522, u'host': u'couchbase:8091', u'name': u'bucket', u'module': u'couchbase'}}
14:21:02 
14:21:02 --------------------- >> end captured stdout << ----------------------

It also breaks in Filebeat tests on Darwin:

15:02:55 ======================================================================
15:02:55 FAIL: Checks that files which were removed, the state is removed
15:02:55 ----------------------------------------------------------------------
15:02:55 Traceback (most recent call last):
15:02:55   File "/private/var/lib/jenkins/workspace/elastic+beats+pull-request+multijob-darwin/beat/filebeat/label/macosx/src/github.com/elastic/beats/filebeat/tests/system/test_registrar.py", line 904, in test_clean_removed
15:02:55     assert len(data) == 1
15:02:55 AssertionError

@webmat
Copy link
Contributor Author

webmat commented Nov 13, 2018

Here's a copy of a recap I've written elsewhere about the progress of this issue :-)

I'm not sure the way common fields are implemented are a good fit for sharing the ECS fields. I think the system to share common fields was tailored to share a very small amount of very generic fields (like "beat.*"), but doesn't work well when sharing 100+ fields. It exposes too many conflicts that may not be necessary:

  • Heartbeat uses "http" at the top level to store HTTP check results with various execution times, and almost no common grounds with the ECS "http" field set. I think Heartbeat doesn't need to move to ECS as quickly as the others, because my understanding is that people don't consume the Heartbeat index directly anyway. But the current field sharing doesn't let us hold off on sharing the fields with Heartbeat, as far as I can tell.
  • Currently the ECS field sharing is growing the documentation of each Beat tremendously by adding over 100 common fields to each. I think that's counter-productive. I think the field documentation should separate the shared fields from each Beat's specific field.
  • I've had conflicts where my "http" group of fields caused doc generation issues vs some Beat's "http" group of fields as well, because this was re-defining an existing field. I can work around it by having only one "http" group of fields in the common fields, and re-defining the each Beat's additional http fields without nesting them in a group (e.g. "- name: http.request.params"). I've done so in Packetbeat, Metricbeat and Heartbeat in this PR.
    • In each case I also had to rename the "- key: http" section to something like "- key: http-metrics" to avoid some problems in the doc generation. I'm not familiar enough with the process to fully understand this, but it seems to generate nonsensical artifacts in the asciidoc files.
  • The state of the PR as I'm writing this is that there's isolated problems mentioned in my previous comment, and I'm not sure if they're flaky tests. It seems like they are, because field renames should fail equally everywhere, not one test failure on Darwin and a different test on Ubuntu :-)

In conclusion, I think we could modify the mechanism for sharing common fields to make this much smoother:

  • Make the sharing of the ECS fields optional for some beats, like Heartbeat. My understanding is that moving to ECS is not as pressing there (even if we may eventually want do it).
  • Separate each Beat's specific fields in the doc, from the common fields it inherits
  • Figure out a good way for a Beat to augment a field group (e.g. "http") without raising an exception or without forcing us to artificially rename the "key" in the yml files.

Another thing we could do, to move things along in the shorter term is to take "http" out of this PR with the cleanup this entails. The "http" field set has been taken out of ECS for Beta 1, because we're debating whether we should nest protocol breakdowns under "network." or not. If we decide to do so, most of the problems above are moot.

@ruflin ruflin mentioned this pull request Nov 14, 2018
@webmat
Copy link
Contributor Author

webmat commented Jan 10, 2019

No longer relevant. Was done in small parts.

@webmat webmat closed this Jan 10, 2019
@webmat webmat deleted the ecs_beta1_field_update branch January 10, 2019 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecs in progress Pull request is currently in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants