Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] E-Mail #999

Merged
merged 7 commits into from
Nov 30, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 29 additions & 26 deletions rfcs/text/0008-email.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# 0008: Email
<!-- Leave this ID at 0000. The ECS team will assign a unique, contiguous RFC number upon merging the initial stage of this RFC. -->

- Stage: **1** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html -->
- Stage: **1 (proposal)** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html -->
ebeahan marked this conversation as resolved.
Show resolved Hide resolved
- Date: **Oct 5th 2020** <!-- The ECS team sets this date at merge time. This is the date of the latest stage advancement. -->
webmat marked this conversation as resolved.
Show resolved Hide resolved

This RFC proposes a new top-level field to facilitate email use cases.
This RFC proposes a new top-level field to facilitate email use cases.

<!--
As you work on your RFC, use the "Stage N" comments to guide you in what you should focus on, for the stage you're targeting.
Expand All @@ -21,32 +21,35 @@ Stage 0: Provide a high level summary of the premise of these changes. Briefly d
Stage 1: Describe at a high level how this change affects fields. Which fieldsets will be impacted? How many fields overall? Are we primarily adding fields, removing fields, or changing existing fields? The goal here is to understand the fundamental technical implications and likely extent of these changes. ~2-5 sentences.
-->

Email specific fields:

| field | type | description |
| --- | --- | --- |
| `email.action` | keyword | Action take by the source device, e.g. delivered, blocked, quarantined, deleted |
| `email.bcc.address` | keyword | Addresses of Bcc's |
| `email.bcc.domain` | keyword | Domains of the Bcc's |
| `email.cc.address` | keyword | Addresses of Cc's |
| `email.cc.domain` | keyword | Domains of Cc addresses |
| `email.cipher` | keyword | Cipher used e.g. TLS |
| `email.file.count` | value | Number of attachments included in the message |
| `email.file.extension` | keyword | Extensions of attachment, e.g. .zip, .docx |
| `email.file.hash` | keyword | Hash of attachments |
| `email.file.name` | keyword | File name of attachements |
| `email.file.size` | keyword | Total size of all attachements in bytes |
| `email.bcc.addresses` | wildcard | Addresses of Bcc's |
P1llus marked this conversation as resolved.
Show resolved Hide resolved
| `email.cc.addresses` | wildcard | Addresses of Cc's |
| `email.attachments_count` | long | A field outside the flattened structure to control how many attachments are included in the email |
| `email.attachments` | flattened | A flattened field for anything related to attachments. This allows objects being stored with all information for each file when you have multiple attachments |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .attachments.* fields should follow the file.* fields. We can state this approach in the description for now.

We can see later about the implementation, whether it's full reuse, or explicitly defining the fields that make sense for attachments.

| `email.direction` | keyword | Direction of the message based on the sending and receving domains |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we for see as values for this?, and from which address fields (to, cc, bcc) would it be categorized on?
It seems to me like something that could potentially be difficult to implement, and not sure of the value for visualizations (but I could easily be missing something obvious, its been one of those days...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, @dainperkins.

I assume the allowed values in there should be "inbound" and "outbound". Perhaps also "unknown" in the case of relays? Actually just like network.direction, "internal" is another class of emails that has a different threat profile. I wonder if there's a need for the value "external" (as in, I'm just an exchange, relaying between Yahoo and Gmail)?

I agree populating this consistently may not be obvious in all scenarios.

I don't think as a third party, our solutions can determine between "inbound", "outbound" and "internal" without specific configuration that says what are "my domains".

But once we know that, I assume the heuristic is pretty straightforward:

  • direction = inbound when from is not one of "my domains"
  • direction = outbound when from = "my domains" and at least one receiver (to, cc, bcc) contains addresses not in "my domains"
  • direction = outbound when from = "my domains" and all receivers are "my domains"

So I'm +1 on adding the field. I think it makes sense. And unless I'm missing something, I think the heuristics are reasonable; and actually, perhaps some of the email-related event sources already provide such values? It's certainly useful for a spam filter to know which emails to filter. Not sure if it shows up in their logs though.

Action item for the RFC, though: let's start listing expected values for this field. I'm providing ideas above as a strawperson, based on what we have in network.direction. But if email data sources have other values for this, let's bring them to the table as well.

| `email.from.address` | keyword | Senders email address |
| `email.from.domain` | keyword | Senders domain |
| `email.latency` | keyword | The time, in milliseconds, the delivery attempt took |
| `email.sender.address` | wildcard | Senders email address |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, address will contain the full Person Name <person@example.com>.

We're defining the domain breakdown fields here because the sender is potentially a threat, and this is where we'll be looking for known bad domains/TLDs and so on.

But looking at the fields, I wonder if we should do the same with email.reply_to.address and email.return_path.address? They're also relevant to the sender.

We can hold off on adding them for now, but I'm floating the idea to get feedback on whether there's a need for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth exploring in the upcoming stage for sure if that is appropriate.

| `email.sender.top_level_domain` | keyword | Senders email address |
webmat marked this conversation as resolved.
Show resolved Hide resolved
| `email.message_id` | keyword | Internet message ID of the message |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message IDs can be pretty creative. For example one of the message IDs for this PR's email notifications was <elastic/ecs/pull/999/review/503143839@github.com>.

So I would make this one wildcard.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevertheless the message_id captures the uniqueness of a mail.
I can see that different mail servers have specific ways of building this Message ID and could be interesting (for identification purposes) capturing such behaviour (and spot the anomalies). With this said, a multi-field mapping would make sense here:

 | `email.message_id` | keyword | Internet message ID of the message |
| `email.message_id.text` | text | Internet message ID of the message for full text search |

| `email.process` | keyword | Name of the executable that carried out the transaction, e.g. outlook, sendmail |
| `email.protocol` | keyword | The email protocol used, e.g. SMTP, IMAP |
| `email.reply_to.address` | keyword | Reply-to address |
| `object.return.address` | keyword | The return address for the message |
| `email.reply_to.address` | wildcard | Reply-to address |
| `email.return.address` | wildcard | The return address for the message |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming this to return_path -- it's a bit more descriptive of what I think you're actually going for here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on this point as well, will update it in the next commit.

| `email.size` | keyword | Total size of the message, in bytes, including attachments |
webmat marked this conversation as resolved.
Show resolved Hide resolved
| `email.subject` | keyword | Subject of the message |
| `email.to` | keyword | Recipieint address |
| `email.to.domain` | keyword | Recipient domain |
| `email.subject` | wildcard | Subject of the message |
| `email.recipients.addresses` | keyword | Recipient addresses |
| `email.domains` | keyword | domains related to the email |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field really feels like it should be part of the related fields. Something like related.domains (though it currently doesn't exist, so it might be worth keeping here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this field is the outcome of a current discussion we have. Instead of having domain fields for bcc, cc, recipients etc, we decided currently to have them all as an array under one field. This might change in the upcoming stages. Thanks for the pointer, always happy to get feedback

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my concern mainly with a related field is that you lose the directionality of the value. Which might be useful for some use-cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have the email.direction though, would that be sufficient? We would calculate the direction before moving the different domains into email.domains for example.



Other ECS fields used together with email usecases:
| field | description |
| --- | --- |
| `event.duration` | The duration related to the email event. Could be the total duration in Quarantine, how long the email tok to send from source to destination etc |
ebeahan marked this conversation as resolved.
Show resolved Hide resolved
| `process.name` | When the event is related to a server or client. Does not take MTA into account which is part of a ongoing discussion |
| `network.protocol` | Type of email protocol used |
| `tls.*` | Used for TLS related information for the connection to for example a SMTP server over TLS |



## Usage

Expand Down Expand Up @@ -112,8 +115,8 @@ People

The following are the people that consulted on the contents of this RFC.

Jamie Hynds | author
TBD | Sponsor
Marius Iversen | Author
ebeahan marked this conversation as resolved.
Show resolved Hide resolved
Jamie Hynds | Sponsor

<!--
Who will be or has been consulted on the contents of this RFC? Identify authorship and sponsorship, and optionally identify the nature of involvement of others. Link to GitHub aliases where possible. This list will likely change or grow stage after stage.
Expand All @@ -136,7 +139,7 @@ e.g.:

<!-- An RFC should link to the PRs for each of it stage advancements. -->

* Stage 0: https://github.com/elastic/ecs/pull/NNN
* Stage 0: https://github.com/elastic/ecs/pull/999
P1llus marked this conversation as resolved.
Show resolved Hide resolved

<!--
* Stage 1: https://github.com/elastic/ecs/pull/NNN
Expand Down