diff --git a/rfcs/text/0010-email.md b/rfcs/text/0010-email.md index 1ebd111af2..98e1efc575 100644 --- a/rfcs/text/0010-email.md +++ b/rfcs/text/0010-email.md @@ -1,19 +1,10 @@ # 0010: Email -- Stage: **1 (proposal)** -- Date: **2020-11-30** +- Stage: **1 (draft)** +- Date: **2021-08-16** -This RFC proposes a new top-level field to facilitate email use cases. - - - - +This RFC proposes a new top-level field set to facilitate email use cases, `email.*`. The `email.*` field set adds fields for the sender, recipient, message header fields, and other attributes of an email message typically seen logs produced by mail transfer agent (MTA) and email gateway applications. ## Fields @@ -21,40 +12,35 @@ Stage 0: Provide a high level summary of the premise of these changes. Briefly d Stage 1: Describe at a high level how this change affects fields. Which fieldsets will be impacted? How many fields overall? Are we primarily adding fields, removing fields, or changing existing fields? The goal here is to understand the fundamental technical implications and likely extent of these changes. ~2-5 sentences. --> -Email specific fields: +### Email specific fields | field | type | description | | --- | --- | --- | -| `email.bcc.addresses` | `wildcard[]` | Addresses of Bcc's | -| `email.cc.addresses` | `wildcard[]` | Addresses of Cc's | -| `email.attachments_count` | long | A field outside the flattened structure to control how many attachments are included in the email | -| `email.attachments` | flattened | A flattened field for anything related to attachments. This allows objects being stored with all information for each file when you have multiple attachments | -| `email.direction` | keyword | Direction of the message based on the sending and receving domains | -| `email.sender.address` | wildcard | Senders email address | -| `email.sender.domain` | wildcard | Domain of the sender | -| `email.sender.top_level_domain` | keyword | Top level domain of the sender | -| `email.sender.registered_domain` | wildcard | Registered domain of the sender | -| `email.sender.subdomain` | keyword | Subdomain of the sender | -| `email.message_id` | keyword | Internet message ID of the message | -| `email.reply_to.address` | wildcard | Reply-to address | -| `email.return_path.address` | wildcard | The return address for the message | -| `email.size` | long | Total size of the message, in bytes, including attachments | -| `email.subject` | wildcard | Subject of the message | -| `email.recipients.addresses` | `keyword[]` | Recipient addresses | -| `email.domains` | `keyword[]` | domains related to the email | - - -Other ECS fields used together with email usecases: -| field | description | -| --- | --- | -| `event.duration` | The duration related to the email event. Could be the total duration in Quarantine, how long the email took to send from source to destination etc | -| `event.start` | When the email event started -| `event.end` | When the email event ended -| `process.name` | When the event is related to a server or client. Does not take MTA into account which is part of a ongoing discussion | -| `network.protocol` | Type of email protocol used | -| `tls.*` | Used for TLS related information for the connection to for example a SMTP server over TLS | - - +| `email.from` | keyword | Stores the `from` email address from the RFC5322 `From:` header field. | +| `email.origination_timestamp` | date | The date and time the email message was composed. Many email clients will fill this in automatically when the message is sent by a user. | +| `email.delivery_timestamp` | date | The date and time the email message was received by the service or client. | +| `email.to` | keyword (array) | The email address(es) of the message recipient(s) | +| `email.subject` | keyword; `.text` text multi-field | A brief summary of the topic of the message | +| `email.cc` | keyword (array) | The email address(es) of the carbon copy (CC) recipient(s) | +| `email.bcc` | keyword (array) | The email address(es) of the blind carbon copy (CC) recipient(s) | +| `email.content_type` | keyword | Information about how the message is to be displayed. Typically a MIME type | +| `email.message_id` | wildcard | Identifier from the RFC5322 `Message-ID:` header field that refers to a particular version of a particular message. | +| `email.local_id` | keyword | Unique identifier given to the email by the source (MTA, gateway, etc.) that created the event and is not persistent across hops (for example, the `X-MS-Exchange-Organization-Network-Message-Id` id). | +| `email.reply_to` | keyword | The address that replies should be delivered to from the RFC 5322 `Reply-To:` header field. | +| `email.direction` | keyword | Direction of the message based on the sending and receiving domains | +| `email.x_mailer` | keyword | What application was used to draft and send the original email. | +| `email.attachments` | nested | Nested object of attachments on the email. | +| `email.attachments.file.mime_type` | keyword | MIME type of the attachment file. | +| `email.attachments.file.name` | keyword | Name of the attachment file including the extension. | +| `email.attachments.file.extension` | keyword | Attachment file extension, excluding the leading dot. | +| `email.attachments.file.size` | long | Attachment file size in bytes. | +| `email.attachments.hash.md5` | keyword | MD5 hash of the file attachment. | +| `email.attachments.hash.sha1` | keyword | SHA-1 hash of the file attachment. | +| `email.attachments.hash.sha256` | keyword | SHA-256 hash of the file attachment. | + +### Additional event categorization allowed values + +Email events may benefit from an additional ECS allowed event categorization value: `event.category: email`. ## Usage @@ -62,7 +48,7 @@ Other ECS fields used together with email usecases: Stage 1: Describe at a high-level how these field changes will be used in practice. Real world examples are encouraged. The goal here is to understand how people would leverage these fields to gain insights or solve problems. ~1-3 paragraphs. --> -Email use cases stretch across all three Elastic solutions - Search, Observe, Protect. Whether it's searching for content within email, ensuring email infrastrucure is operational or detecting email based attacks, there are many possibilities for email fields within ECS. +Email use cases stretch across all three Elastic solutions - Search, Observe, Protect. Whether it's searching for content within email, ensuring email infrastructure is operational or detecting email based attacks, there are many possibilities for email fields within ECS. ## Source data @@ -78,6 +64,199 @@ Stage 1: Provide a high-level description of example sources of data. This does Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting. --> +### Office365 - Successful Delivery + +#### Original log + +```json +{ + "EndDate": "2020-11-10T22:12:34.8196921Z", + "FromIP": "8.8.8.8", + "Index": 25, + "MessageId": "\\u003c95689d8d5e7f429390a4e3646eef75e8-JFBVALKQOJXWILKBK4YVA7APGM3DKTLFONZWCZ3FINSW45DFOJ6EAQ2ENFTWK43UL4YTCMBYGIYHYU3NORYA====@microsoft.com\\u003e", + "MessageTraceId": "ff1a64a3-cafb-41b7-1efb-08d8848aedc3", + "Organization": "testdomain.onmicrosoft.com", + "Received": "2020-11-09T04:50:06.3312635", + "RecipientAddress": "john@testdomain.onmicrosoft.com", + "SenderAddress": "o365mc@microsoft.com", + "Size": 64329, + "StartDate": "2020-11-08T22:12:34.8196921Z", + "Status": "Delivered", + "Subject": "Weekly digest: Microsoft service updates", + "ToIP": null +} +``` + +#### Mapped event + +```json +{ + "@timestamp": 1626984241830, + "email": { + "timestamp": "2020-11-08T22:12:34.8196921Z", + "from": [ + "o365mc@microsoft.com" + ], + "to": [ + "john@testdomain.onmicrosoft.com" + ], + "subject": "Weekly digest: Microsoft service updates", + "message_id": "\\u003c95689d8d5e7f429390a4e3646eef75e8-JFBVALKQOJXWILKBK4YVA7APGM3DKTLFONZWCZ3FINSW45DFOJ6EAQ2ENFTWK43UL4YTCMBYGIYHYU3NORYA====@microsoft.com\\u003e" + }, + "event": { + "action": "delivered", + "kind": "event", + "category": [ + "email", + "network" + ] + } +} +``` + +### Office365 - Undeliverable + +#### Original log + +```json +{ + "EndDate": "2020-11-10T22:12:34.8196921Z", + "FromIP": null, + "Index": 8, + "MessageId": "\\u003c72872e16-f4c2-4eef-a393-e5621748a0ff@AS8P19vMB1605.EURP191.PROD.OUTLOOK.COM\\u003e", + "MessageTraceId": "a4bd8c4c-3a4f-427f-8952-08d8850f9c20", + "Organization": "testdomain.onmicrosoft.com", + "Received": "2020-11-10T00:28:56.3306834", + "RecipientAddress": "o365mc@microsoft.com", + "SenderAddress": "postmaster@testdomain.onmicrosoft.com", + "Size": 96627, + "StartDate": "2020-11-08T22:12:34.8196921Z", + "Status": "Delivered", + "Subject": "Undeliverable: Message Center Major Change Update Notification", + "ToIP": "8.8.8.8" +} +``` + +#### Mapped event + +```json +{ + "@timestamp": 1626984241830, + "email": { + "timestamp": "2020-11-10T22:12:34.8196921Z", + "from": [ + "postmaster@testdomain.onmicrosoft.com" + ], + "to": [ + "o365mc@microsoft.com" + ], + "subject": "Undeliverable: Message Center Major Change Update Notification", + "message_id": "\\u003c72872e16-f4c2-4eef-a393-e5621748a0ff@AS8P19vMB1605.EURP191.PROD.OUTLOOK.COM\\u003e" + }, + "event": { + "action": "delivered", + "kind": "event", + "category": [ + "email", + "network" + ] + } +} +``` + +### Proofpoint Tap + +#### Original log + +``` +<38>1 2016-06-24T21:00:08Z - ProofpointTAP - MSGBLK [tapmsg@21139 messageTime="2016-06-24T21:18:38.000Z" messageID="20160624211145.62086.mail@evil.zz" recipient="clark.kent@pharmtech.zz, diana.prince@pharmtech.zz" sender="e99d7ed5580193f36a51f597bc2c0210@evil.zz" senderIP="192.0.2.255" phishScore="46" spamScore="4" QID="r2FNwRHF004109" GUID="c26dbea0-80d5-463b-b93c-4e8b708219ce" subject="Please find a totally safe invoice attached." quarantineRule="module.sandbox.threat" quarantineFolder="Attachment Defense" policyRoutes="default_inbound,executives" modulesRun="sandbox,urldefense,spam,pdr" headerFrom="\"A. Badguy\" " headerTo="\"Clark Kent\" ; \"Diana Prince\" " headerCC="\"Bruce Wayne\" " headerReplyTo="null" toAddresses="clark.kent@pharmtech.zz,diana.prince@pharmtech.zz" ccAddresses="bruce.wayne@university-of-education.zz" fromAddress="badguy@evil.zz" replyToAddress="null" clusterId="pharmtech_hosted" messageParts="[{\"contentType\":\"text/plain\",\"disposition\":\"inline\",\"filename\":\"text.txt\",\"md5\":\"008c5926ca861023c1d2a36653fd88e2\",\"oContentType\":\"text/plain\",\"sandboxStatus\":\"unsupported\",\"sha256\":\"85738f8f9a7f1b04b5329c590ebcb9e425925c6d0984089c43a022de4f19c281\"},{\"contentType\":\"application/pdf\",\"disposition\":\"attached\",\"filename\":\"Invoice for Pharmtech.pdf\",\"md5\":\"5873c7d37608e0d49bcaa6f32b6c731f\",\"oContentType\":\"application/pdf\",\"sandboxStatus\":\"threat\",\"sha256\":\"2fab740f143fc1aa4c1cd0146d334c5593b1428f6d062b2c406e5efe8abe95ca\"}]" xmailer="Spambot v2.5"] +``` + +#### Mapped event + +```json +{ + "@timestamp": "2016-06-24T21:00:08Z", + "email": { + "timestamp": "2016-06-24T21:18:38.000Z", + "message_id": "20160624211145.62086.mail@evil.zz", + "local_id": "c26dbea0-80d5-463b-b93c-4e8b708219ce", + "to": [ + "clark.kent@pharmtech.zz", + "diana.prince@pharmtech.zz" + ], + "cc": [ + "bruce.wayne@university-of-education.zz" + ], + "from": [ + "badguy@evil.zz" + ], + "subject": "Please find a totally safe invoice attached.", + "reply_to": "null", + "x_mailer": "Spambot v2.5", + "attachments": [ + { + "file": { + "mime_type": "application/pdf", + "name": "Invoice for Pharmtech.pdf", + "extension": "pdf" + }, + "hash": { + "md5": "5873c7d37608e0d49bcaa6f32b6c731f", + "sha256": "2fab740f143fc1aa4c1cd0146d334c5593b1428f6d062b2c406e5efe8abe95ca" + } + } + ] + }, + "event": { + "id": "c26dbea0-80d5-463b-b93c-4e8b708219ce", + "kind": "event", + "category": "email", + "action": "MSGBLK" + }, + "source": { + "address": 192.0.2.255, + "ip": 192.0.2.255 + } +} +``` + +### Mimecast Receipt log + +#### Original log + +``` +datetime=2017-05-26T16:47:41+0100|aCode=7O7I7MvGP1mj8plHRDuHEA|acc=C0A0|SpamLimit=0|IP=123.123.123.123|Dir=Internal|MsgId=<81ce15$8r2j59@mail01.example.com>|Subject=\message subject\|headerFrom=from@mimecast.com|Sender=from@mimecast.com|Rcpt=auser@mimecast.com|SpamInfo=[]|Act=Acc|TlsVer=TLSv1|Cphr=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA|SpamProcessingDetail={"spf":{"info":"SPF_FAIL","allow":true},"dkim":{"info":"DKIM_UNKNOWN","allow":true}}|SpamScore=1 +``` + +#### Mapped event + +```json +{ + "@timestamp": "2017-05-26T16:47:41+0100", + "source": { + "address": 123.123.123.123, + "ip": 123.123.123.123 + }, + "email": { + "message_id": "<81ce15$8r2j59@mail01.example.com>", + "from": [ + "from@mimecast.com" + ], + "to": [ + "auser@mimecast.com" + ], + "subject": "message subject", + "direction": "internal" + }, + "tls": { + "cipher": "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA", + "version": "1.0", + "version_protocol": "tls" + } +} +``` + @@ -97,37 +276,46 @@ The goal here is to research and understand the impact of these changes on users -Current concerns or topics still being discussed from stage 1: -- Whether we want to add specific fields for email protocols, either as a root field or nested under email.* (SMTP, IMAP, POP etc). -- Need to make sure that the ECS fieldset for email catches all common usecases, for example spam, metrics and deliverables and logging. -- Whether we want to create a new event.category field (email) and which event.type it should be combined with. -- The email RFC will be the first ECS fieldset that uses the flattened datatype (for attachments), need to ensure that there will be major issues related to this. +### Email messages vs. protocols - +The fields proposed in this document are focused on the contents of an email message but not on specific fields for email protocols. Do protocols like SMTP, POP3, IMAP, etc. be represented in ECS? - +For example, users may need to compare the email address from the SMTP (envelope) sender to the `From:` header email address. + +### Email metrics and observability use caes + +Does the initial set of `email` fields need to consider observability and email monitoring use cases, for example spam, metrics, deliverables, and logging. + +### Additional event categorization values + +Should a new event.category field (email) be created, and, if so, which `event.type` values the `email` category should be combined with? + +### Display names + +Should the display name be captured separately from the email address for senders and recipients. If so, how do we accomplish this in a document while keeping the 1:1 of a display name to email address. + +### Spam processing details + +Should fields intended to capture details around spam processing like sender policy framework (SPF), domainkeys identified mail (DKIM), or domain-based message authentication, reporting, and conformance (DMARC) be in scope for this proposal as well? -## Real-world implementations - ## People The following are the people that consulted on the contents of this RFC. -* @p1llus | Author -* @jamiehynds | Sponsor +* @ebeahan | Co-author +* @P1llus | Co-author, subject matter expert +* @jamiehynds | Co-sponsor +* @devonakerr | Co-sponsor + -* Stage 1: https://github.com/elastic/ecs/pull/999 +* Stage 1 (formerly proposal stage): https://github.com/elastic/ecs/pull/999 * RFC ID correction: https://github.com/elastic/ecs/pull/1157 +* Stage 1 (draft): https://github.com/elastic/ecs/pull/1219 diff --git a/rfcs/text/0010/email.yml b/rfcs/text/0010/email.yml new file mode 100644 index 0000000000..2d98a88061 --- /dev/null +++ b/rfcs/text/0010/email.yml @@ -0,0 +1,116 @@ +--- +- name: email + title: Email + group: 2 + short: Fields describing an email message. + description: > + Email fields are used for information about an email message header and body. + type: group + fields: + + - name: to + level: extended + type: keyword + short: Recipient address(es) + description: > + Stores the `to` email address(es). + example: "employee@example.com" + normalize: + - array + + - name: from + level: extended + type: keyword + short: Sender address + description: > + Stores the `from` email address. + example: "administrator@example.com" + + - name: cc + level: extended + type: keyword + short: CC recipient(s) + description: > + The email address(es) of the carbon copy (CC) recipient(s). + example: '["cc.user@example.com", "cc.user2@example.com"]' + normalize: + - array + + - name: bcc + level: extended + type: keyword + short: BCC recipient(s) + description: > + The email address(es) of the blind carbon copy (BCC) recipient(s). + example: '["bcc.user@example.com", "bcc.user2@example.com"]' + normalize: + - array + + - name: reply_to + level: extended + type: keyword + short: Address for replies. + description: > + The address that replies should be delivered to. + example: "user@example.com" + + - name: timestamp + level: extended + type: date + short: Date and time message was sent. + description: > + The date and time that the sender authorized delivery of the message. + + For example, by pressing a "Send" button in their email client. + example: "2020-11-10T22:12:34.8196921Z" + + - name: subject + level: extended + type: keyword + short: Topic of the message + description: > + A brief summary of the topic of the message. + example: "Status update: email fields progress" + multi_fields: + - type: match_only_text + name: text + + - name: content_type + level: extended + type: keyword + short: Content-Type header value + description: > + Information about how the message is to be displayed. Typically a MIME type. + example: "multipart/mixed" + + - name: message_id + level: extended + type: wildcard + short: Unique identifier of the message + description: > + Unique identifier for the email message that refers to a particular version of a particular message. + + An example would be the identifier found in the `Message-ID` email header. + example: "" + + - name: direction + level: extended + type: keyword + short: Direction of the email message + description: > + The direction of the message based on the sending and received domains. + + Recommended values are: + * `inbound` - From external senders to internal recipients + * `outbound` - From internal senders to external recipients + * `internal` - From internal senders to internal recipients + * `external` - From external senders to external recipients + * `unknown` - Direction is unknown + + - name: x_mailer + level: extended + type: keyword + short: The value from the X-Mailer header + description: > + The value from the `X-Mailer` header. Value captures what application was used + to draft and send the original email message. diff --git a/rfcs/text/0010/event.yml b/rfcs/text/0010/event.yml new file mode 100644 index 0000000000..96aaab7ec8 --- /dev/null +++ b/rfcs/text/0010/event.yml @@ -0,0 +1,11 @@ +--- +- name: event + fields: + + - name: category + allowed_values: + - name: email + description: > + Events in this category are related to the email traffic and messages. + expected_event_types: + - info