Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] E-Mail #999

Merged
merged 7 commits into from
Nov 30, 2020
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions rfcs/text/0008-email.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# 0008: Email
<!-- Leave this ID at 0000. The ECS team will assign a unique, contiguous RFC number upon merging the initial stage of this RFC. -->

- Stage: **1 (proposal)** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html -->
ebeahan marked this conversation as resolved.
Show resolved Hide resolved
- Date: **Oct 5th 2020** <!-- The ECS team sets this date at merge time. This is the date of the latest stage advancement. -->
webmat marked this conversation as resolved.
Show resolved Hide resolved

This RFC proposes a new top-level field to facilitate email use cases.

<!--
As you work on your RFC, use the "Stage N" comments to guide you in what you should focus on, for the stage you're targeting.
Feel free to remove these comments as you go along.
-->

<!--
Stage 0: Provide a high level summary of the premise of these changes. Briefly describe the nature, purpose, and impact of the changes. ~2-5 sentences.
-->

## Fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another item that came to mind, and I think makes sense to capture for further discussion later:

Would email be worth considering as an additional allowed value for event.category?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say +1 on this.

@ebeahan if we would create this new event.category we should also update which event.type we can combine with it and maybe see if we need new ones?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mention this in the RFC "Fields" section.


<!--
Stage 1: Describe at a high level how this change affects fields. Which fieldsets will be impacted? How many fields overall? Are we primarily adding fields, removing fields, or changing existing fields? The goal here is to understand the fundamental technical implications and likely extent of these changes. ~2-5 sentences.
-->

Email specific fields:

| field | type | description |
| --- | --- | --- |
| `email.bcc.addresses` | wildcard | Addresses of Bcc's |
P1llus marked this conversation as resolved.
Show resolved Hide resolved
| `email.cc.addresses` | wildcard | Addresses of Cc's |
| `email.attachments_count` | long | A field outside the flattened structure to control how many attachments are included in the email |
| `email.attachments` | flattened | A flattened field for anything related to attachments. This allows objects being stored with all information for each file when you have multiple attachments |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .attachments.* fields should follow the file.* fields. We can state this approach in the description for now.

We can see later about the implementation, whether it's full reuse, or explicitly defining the fields that make sense for attachments.

| `email.direction` | keyword | Direction of the message based on the sending and receving domains |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we for see as values for this?, and from which address fields (to, cc, bcc) would it be categorized on?
It seems to me like something that could potentially be difficult to implement, and not sure of the value for visualizations (but I could easily be missing something obvious, its been one of those days...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, @dainperkins.

I assume the allowed values in there should be "inbound" and "outbound". Perhaps also "unknown" in the case of relays? Actually just like network.direction, "internal" is another class of emails that has a different threat profile. I wonder if there's a need for the value "external" (as in, I'm just an exchange, relaying between Yahoo and Gmail)?

I agree populating this consistently may not be obvious in all scenarios.

I don't think as a third party, our solutions can determine between "inbound", "outbound" and "internal" without specific configuration that says what are "my domains".

But once we know that, I assume the heuristic is pretty straightforward:

  • direction = inbound when from is not one of "my domains"
  • direction = outbound when from = "my domains" and at least one receiver (to, cc, bcc) contains addresses not in "my domains"
  • direction = outbound when from = "my domains" and all receivers are "my domains"

So I'm +1 on adding the field. I think it makes sense. And unless I'm missing something, I think the heuristics are reasonable; and actually, perhaps some of the email-related event sources already provide such values? It's certainly useful for a spam filter to know which emails to filter. Not sure if it shows up in their logs though.

Action item for the RFC, though: let's start listing expected values for this field. I'm providing ideas above as a strawperson, based on what we have in network.direction. But if email data sources have other values for this, let's bring them to the table as well.

| `email.sender.address` | wildcard | Senders email address |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, address will contain the full Person Name <person@example.com>.

We're defining the domain breakdown fields here because the sender is potentially a threat, and this is where we'll be looking for known bad domains/TLDs and so on.

But looking at the fields, I wonder if we should do the same with email.reply_to.address and email.return_path.address? They're also relevant to the sender.

We can hold off on adding them for now, but I'm floating the idea to get feedback on whether there's a need for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth exploring in the upcoming stage for sure if that is appropriate.

| `email.sender.domain` | wildcard | Domain of the sender |
| `email.sender.top_level_domain` | keyword | Top level domain of the sender |
| `email.sender.registered_domain` | wildcard | Registered domain of the sender |
| `email.sender.subdomain` | keyword | Subdomain of the sender |
| `email.message_id` | keyword | Internet message ID of the message |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message IDs can be pretty creative. For example one of the message IDs for this PR's email notifications was <elastic/ecs/pull/999/review/503143839@github.com>.

So I would make this one wildcard.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevertheless the message_id captures the uniqueness of a mail.
I can see that different mail servers have specific ways of building this Message ID and could be interesting (for identification purposes) capturing such behaviour (and spot the anomalies). With this said, a multi-field mapping would make sense here:

 | `email.message_id` | keyword | Internet message ID of the message |
| `email.message_id.text` | text | Internet message ID of the message for full text search |

| `email.reply_to.address` | wildcard | Reply-to address |
| `email.return_path.address` | wildcard | The return address for the message |
| `email.size` | long | Total size of the message, in bytes, including attachments |
| `email.subject` | wildcard | Subject of the message |
| `email.recipients.addresses` | keyword | Recipient addresses |
| `email.domains` | keyword | domains related to the email |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field really feels like it should be part of the related fields. Something like related.domains (though it currently doesn't exist, so it might be worth keeping here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this field is the outcome of a current discussion we have. Instead of having domain fields for bcc, cc, recipients etc, we decided currently to have them all as an array under one field. This might change in the upcoming stages. Thanks for the pointer, always happy to get feedback

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my concern mainly with a related field is that you lose the directionality of the value. Which might be useful for some use-cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have the email.direction though, would that be sufficient? We would calculate the direction before moving the different domains into email.domains for example.



Other ECS fields used together with email usecases:
| field | description |
| --- | --- |
| `event.duration` | The duration related to the email event. Could be the total duration in Quarantine, how long the email took to send from source to destination etc |
| `event.start` | When the email event started
| `event.end` | When the email event ended
| `process.name` | When the event is related to a server or client. Does not take MTA into account which is part of a ongoing discussion |
| `network.protocol` | Type of email protocol used |
| `tls.*` | Used for TLS related information for the connection to for example a SMTP server over TLS |



## Usage

<!--
Stage 1: Describe at a high-level how these field changes will be used in practice. Real world examples are encouraged. The goal here is to understand how people would leverage these fields to gain insights or solve problems. ~1-3 paragraphs.
-->

Email use cases stretch across all three Elastic solutions - Search, Observe, Protect. Whether it's searching for content within email, ensuring email infrastrucure is operational or detecting email based attacks, there are many possibilities for email fields within ECS.

## Source data

<!--
Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list.
-->

- **Email Analytics**: [Hubspot](https://legacydocs.hubspot.com/docs/methods/email/email_events_overview), Marketo, Salesforce Pardot
- **Email Server**: [O365 Message Tracing](https://docs.microsoft.com/en-us/exchange/monitoring/trace-an-email-message/run-a-message-trace-and-view-results), [Postfix](https://nxlog.co/documentation/nxlog-user-guide/postfix.html)
- **Email Security**: [Barracuda](https://campus.barracuda.com/product/emailsecuritygateway/doc/12193950/syslog-and-the-barracuda-email-security-gateway/), [Forcepoint](https://www.websense.com/content/support/library/email/v85/email_siem/siem_log_map.pdf), [Mimecast](https://www.mimecast.com/tech-connect/documentation/tutorials/understanding-siem-logs/), [Proofpoint](https://help.proofpoint.com/Threat_Insight_Dashboard/API_Documentation/SIEM_API)

<!--
Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting.
-->

<!--
Stage 3: Add more real world example source documents so we have at least 2 total, but ideally 3. Format as described in stage 2.
-->

## Scope of impact

<!--
Stage 2: Identifies scope of impact of changes. Are breaking changes required? Should deprecation strategies be adopted? Will significant refactoring be involved? Break the impact down into:
* Ingestion mechanisms (e.g. beats/logstash)
* Usage mechanisms (e.g. Kibana applications, detections)
* ECS project (e.g. docs, tooling)
The goal here is to research and understand the impact of these changes on users in the community and development teams across Elastic. 2-5 sentences each.
-->

## Concerns
Copy link

@BenB196 BenB196 Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at the current fields provided, one of my concerns is it appears that they don't fit well with the rest of ECS. I think this can be partially fixed with the use of aliases, though, I don't believe aliases are standard/common in ECS.

Examples:

email.from -> source
email.to|cc|bcc -> destination
email.latency -> event.duration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latency might be redundant depending on the specific action being recorded, but I wouldn't equate email.to|from with source and destination (or client/server) network entities

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think email has enough subtleties in both the "senders" (sender, reply_to, return_path) and the receivers (to, cc, bcc) that I don't think it makes sense to put them in [source|destination].user.email. Also, these aren't array fields anyway, so we couldn't capture everything from the get go.

If an email server logs the source.ip from the MX that sent them an email though, that's totally appropriate to capture in that field.


<!--
Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake.
-->
Current concerns or topics still being discussed from stage 1:

- Whether we want to add specific fields for email protocols, either as a root field or nested under email.* (SMTP, IMAP, POP etc).
- Need to make sure that the ECS fieldset for email catches all common usecases, for example spam, metrics and deliverables and logging.
- Whether we want to create a new event.category field (email) and which event.type it should be combined with.
- The email RFC will be the first ECS fieldset that uses the flattened datatype (for attachments), need to ensure that there will be major issues related to this.

<!--
Stage 2: Document new concerns or resolutions to previously listed concerns. It's not critical that all concerns have resolutions at this point, but it would be helpful if resolutions were taking shape for the most significant concerns.
-->

<!--
Stage 3: Document resolutions for all existing concerns. Any new concerns should be documented along with their resolution. The goal here is to eliminate the risk of churn and instability by resolving outstanding concerns.
-->

<!--
Stage 4: Document any new concerns and their resolution. The goal here is to eliminate risk of churn and instability by ensuring all concerns have been addressed.
-->

## Real-world implementations

<!--
Stage 4: Identify at least one real-world, production-ready implementation that uses these updated field definitions. An example of this might be a GA feature in an Elastic application in Kibana.
-->

People
P1llus marked this conversation as resolved.
Show resolved Hide resolved

The following are the people that consulted on the contents of this RFC.

@p1llus | Author
P1llus marked this conversation as resolved.
Show resolved Hide resolved
@jamiehynds | Sponsor

<!--
Who will be or has been consulted on the contents of this RFC? Identify authorship and sponsorship, and optionally identify the nature of involvement of others. Link to GitHub aliases where possible. This list will likely change or grow stage after stage.

e.g.:

* @Yasmina | author
* @Monique | sponsor
* @EunJung | subject matter expert
* @JaneDoe | grammar, spelling, prose
* @Mariana
-->


## References

<!-- Insert any links appropriate to this RFC in this section. -->
P1llus marked this conversation as resolved.
Show resolved Hide resolved

### RFC Pull Requests

<!-- An RFC should link to the PRs for each of it stage advancements. -->

* Stage 0: https://github.com/elastic/ecs/pull/999
P1llus marked this conversation as resolved.
Show resolved Hide resolved

<!--
* Stage 1: https://github.com/elastic/ecs/pull/NNN
...
-->