Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zeek package #245

Merged
merged 5 commits into from
Sep 18, 2020
Merged

Add zeek package #245

merged 5 commits into from
Sep 18, 2020

Conversation

leehinman
Copy link
Contributor

What does this PR do?

Add package for the filebeat Zeek module. Includes capture_loss,
connection, dce_rpc, dhcp, dnp3, dns, dpd, files, ftp, http, intel
irc, kerberos, modbus, mysql, notice, ntlm, ocsp, pe, radius, rdp
rfb, sip, smb_cmd, smb_files, smb_mapping, smtp, snmp, socks
ssh, ssl, stats, syslog, traceroute, tunnel, weird, x509

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all datasets collect metrics or logs.

Related issues

Screenshots

image
image
image
image
image

@leehinman leehinman added enhancement New feature or request Team:Integrations Label for the Integrations team Team:SIEM (Deprecated) labels Aug 12, 2020
@elasticmachine
Copy link

Pinging @elastic/integrations (Team:Integrations)

@elasticmachine
Copy link

Pinging @elastic/siem (Team:SIEM)

@elasticmachine
Copy link

elasticmachine commented Aug 12, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #245 updated]

  • Start Time: 2020-09-18T16:52:46.692+0000

  • Duration: 4 min 35 sec

@andrewstucki
Copy link

I'll go through this... slowly... reviews will be in chunks

title: 'Collect Zeek capture_loss, connection, dce_rpc, dhcp, dnp3, dns, dpd, files, ftp, http, intel, irc, kerberos, modbus, mysql, notice, ntlm, ocsp, pe, radius, rdp, rfb, sip, smb_cmd, smb_files, smb_mapping, smtp, snmp, socks, ssh, ssl, stats, syslog, traceroute, tunnel, weird and x509 logs'
description: 'Collecting capture_loss, connection, dce_rpc, dhcp, dnp3, dns, dpd, files, ftp, http, intel, irc, kerberos, modbus, mysql, notice, ntlm, ocsp, pe, radius, rdp, rfb, sip, smb_cmd, smb_files, smb_mapping, smtp, snmp, socks, ssh, ssl, stats, syslog, traceroute, tunnel, weird and x509 logs from Zeek instances'
owner:
github: elastic/siem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
github: elastic/siem
github: elastic/security-ingest

description: Collect logs from Zeek instances
inputs:
- type: logfile
title: 'Collect Zeek capture_loss, connection, dce_rpc, dhcp, dnp3, dns, dpd, files, ftp, http, intel, irc, kerberos, modbus, mysql, notice, ntlm, ocsp, pe, radius, rdp, rfb, sip, smb_cmd, smb_files, smb_mapping, smtp, snmp, socks, ssh, ssl, stats, syslog, traceroute, tunnel, weird and x509 logs'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like rather long title. Maybe we should drop the names of the individual logs.

"indexRefName": "kibanaSavedObjectMeta.searchSourceJSON.index",
"query": {
"language": "kuery",
"query": ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these visualizations have a query to only select Zeek data? If so we can take advantage of the constant_keyword data_stream fields for this.

show_user: true
default:
- zeek.dhcp
- name: community_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a lot of work, but I'd like to have this config option removed from everywhere. The processor should always be included. I can't think of any reason to make it configurable.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@mtojek mtojek self-requested a review September 17, 2020 12:14
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately CI reported issues for this PR: https://beats-ci.elastic.co/blue/organizations/jenkins/Beats%2Fintegrations/detail/PR-245/13/pipeline

EDIT:

After a short investigation with @ycombinator , it looks that we have a requirement for dataset names to be at least 3 characters: https://github.com/elastic/package-spec/blob/master/versions/1/dataset/spec.yml#L6 . I'm wondering whether it's a good requirement to prevent a non-descriptive, mysterious names like in this case "pe".

It's not a big deal to change it, but maybe it's better to keep it. WDYT @ruflin @ycombinator ?

@leehinman
Copy link
Contributor Author

Unfortunately CI reported issues for this PR: https://beats-ci.elastic.co/blue/organizations/jenkins/Beats%2Fintegrations/detail/PR-245/13/pipeline

EDIT:

After a short investigation with @ycombinator , it looks that we have a requirement for dataset names to be at least 3 characters: https://github.com/elastic/package-spec/blob/master/versions/1/dataset/spec.yml#L6 . I'm wondering whether it's a good requirement to prevent a non-descriptive, mysterious names like in this case "pe".

It's not a big deal to change it, but maybe it's better to keep it. WDYT @ruflin @ycombinator ?

Just a bit of context. The datasets are named after the ~40 log files that zeek produces. And in this case the log file is called pe.log (abbreviation for Portable Executable).

It might be a little weird for all the others to be direct mappings (mysql.log -> mysql) but this one to be pe.log -> portable_executable. I'd vote for an exception in this case, most of the time a 2 letter name would be mysterious but in this context making it not like all the others could be more mysterious. Luckily this is the only 2 letter log name for zeek. https://docs.zeek.org/en/current/script-reference/log-files.html

@ycombinator
Copy link
Contributor

I'm in favor of keeping it as pe (which would mean relaxing the constraint in the spec). pe makes sense in the context of this integration so I would expect it would make sense to developers of this integration as well.

However, if we decide to change it to be more descriptive, then I'd suggest doing that consistently for other datasets of this integration as well, e.g. dnp3, dpd, ocsp, rfb. But then you'd again run into an issue of subjectivity: who decides whether an abbreviation need to expanded to be more descriptive vs. leaving it as-is?

So I think the rule-of-thumb should be that we use dataset names that makes sense in the context of an integration.

@mtojek
Copy link
Contributor

mtojek commented Sep 18, 2020

Ok, I'm going to open PR against package-spec to adjust the schema.

@ruflin
Copy link
Contributor

ruflin commented Sep 18, 2020

I'm good with changing the constraint. Keep in mind that in most cases this also effects the index name. For short vs long: We probably should have a recommendation but in the end, it is up to the person writing the package to make the call.

@mtojek
Copy link
Contributor

mtojek commented Sep 18, 2020

@leehinman I updated the dependency on package-spec on master branch. Please rebase this PR against master and mage clean && mage check.

- capture_loss
- connection
- dce_rpc
- dhcp
- dnp3
- dns
- dpd
- files
- ftp
- http
- intel
- irc
- kerberos
- modbus
- mysql
- notice
- ntlm
- ocsp
- pe
- radius
- rdp
- rfb
- sip
- smb_cmd
- smb_files
- smb_mapping
- smtp
- snmp
- socks
- ssh
- ssl
- stats
- syslog
- traceroute
- tunnel
- weird
- x509

Closes elastic#225
- quoting of =
- indentation in a painless script

makes it easier to sync with Filebeat
- removed config option for community_id
- changed owner to security-ingest
- Shortened Title and reworded description
- limit visualizations to zeek data
- kerberos, ssl & x509 ECS 1.6.0 changes
- fix community_id processor
- mage check changes
@leehinman
Copy link
Contributor Author

@leehinman I updated the dependency on package-spec on master branch. Please rebase this PR against master and mage clean && mage check.

Done. Thanks for all the help.

@mtojek mtojek self-requested a review September 18, 2020 19:12
@leehinman leehinman merged commit 86425ca into elastic:master Sep 18, 2020
@leehinman leehinman deleted the 225_zeek branch December 1, 2020 16:32
eyalkraft pushed a commit to build-security/integrations that referenced this pull request Mar 30, 2022
* Add zeek package

- capture_loss
- connection
- dce_rpc
- dhcp
- dnp3
- dns
- dpd
- files
- ftp
- http
- intel
- irc
- kerberos
- modbus
- mysql
- notice
- ntlm
- ocsp
- pe
- radius
- rdp
- rfb
- sip
- smb_cmd
- smb_files
- smb_mapping
- smtp
- snmp
- socks
- ssh
- ssl
- stats
- syslog
- traceroute
- tunnel
- weird
- x509
- limit visualizations to zeek data
- removed config option for communit_id processor
- synced with filebeat zeek module for ECS 1.6.0 changes

Closes elastic#225
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Integrations Label for the Integrations team Team:SIEM (Deprecated)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate zeek
7 participants