Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Threat Intel - Stage 1 #1127

Merged
merged 53 commits into from
Feb 18, 2021
Merged

Conversation

shimonmodi
Copy link
Contributor

@shimonmodi shimonmodi commented Nov 16, 2020

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • For proposing substantial changes or additions to the schema, have you reviewed the RFC process?
  • If submitting code/script changes, have you verified all tests pass locally using make test?
  • If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes?
  • Is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
  • Have you added an entry to the CHANGELOG.next.md?

Preview the RFC

@ebeahan ebeahan added the RFC label Nov 17, 2020
@ebeahan ebeahan changed the title Threat Intel ECS RFC - Stage 2 [RFC] Threat Intel ECS RFC - Stage 2 Nov 17, 2020
@ebeahan ebeahan changed the title [RFC] Threat Intel ECS RFC - Stage 2 [RFC] Threat Intel - Stage 2 Nov 17, 2020
* event.risk_score _risk score provided by threat intelligence source_
* event.original _raw intelligence event_

### Using existing ECS Fields nested under Threat.ioc.*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also adding the process fields to the possibilities of being nested?

For example a given process.args could be an IOC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally dropped this in slack, but I think it got lost.

on a bigger picture of "intel words and semantics", it makes sense to me that something like W32.Trojan.22gp.1201 as labelled perhaps from an AV engine or malware analysis feed (ala VirusTotal, ReversingLabs, etc) should be assigned as a rule.name. The rationale being that there's clearly some sort of signature, whether atomic or behavioral.

Meanwhile, saying something is a variant of a given malware family is informative and takes raw information and transforms it into intelligence and better aligns with ontologies like STIX or MAEC. I'm proposing something like:

{
"file": { "hash": "abcdef01234567890...", "name": "totally_bad.exe", ...},
"rule": [ {"author": "ALYac", "name": "Generic.Malware.SPV!PKprn.5A432451", "update": "20200915"} ],
"threat": { 
     "malware": {"name": "WickedBadness.A", "family": "WickedBadness"}, 
     "intrusion_set": {"name": "APT127", ...},
}

Now, that may not align where things are currently heading and I'm not trying to open pandoras box here... the above approach does a few things, IMO:

  1. This can allow me to enrich data coming from various sources (i.e. my VT feed, or file access data from an endpoint, or file transfer data from my proxy or IDS. The enrichment takes previous knowledge and tags this file, network, or process event with "threat" knowledge.
  2. Allows me to take a rule (IDS, AV, Elastic Signal, etc) and say that "X" indicates threat.malware.name: "WickedBadness.A", taking normal time series events and making the detection actionable with why we care about said detection
  3. If the document only included the subset of information to provides a file.hash.sha256 and threat.malware.name, the hash is immediately actionable in how we apply filters in Elasticsearch becomes a pivot field that is shared with my logging sources directly. I can click a single visualization on a Kibana dashboard and drill into documents about a given file transfer and see in an adjacent visualization that the hash I clicked on is associated with a malware family or intrusion set.
    Alternatively, nesting the known file threat indicators under threat.file.hash, we either use lookup detection rules (which we could do in either case), or we rely on a 3rd pivot field of related.hash, which duplicates data.

just sharing some thinking... would love to discuss more in depth when the time is right. Right now may be premature, and if so... sorry for stirring the pot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to distill this a bit, the IOCs should to be actionable entities - an indicator. So threat.malware.* is context and descriptors of the IOC, but it's not an IOC in itself.

So while a lot of the fields are proposed under threat.ioc., threat.malware.{name,family,type} makes sense to me to describe the indicator, which would likely be a hash value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this in depth with @shimonmodi and got some additional context. I think the threat.ioc.* makes sense from a enrichment perspective, but I think as stored in the "threat intel" index, keeping the shared top-level ECS fields makes the data more actionable from a retro-hunt perspective. Namely, if we get intelligence about events after they pass through the enrichment pipeline, those events will miss the threat data. Keeping them in the non-ioc fields allows hunting using dashboards and discover. Copying the respective enrichment match to threat.ioc.* keep the original document intact while also giving context of why we think this particular event is a threat.

I want to propose some hypothetical documents and how this might work. Visualizing what the data could be helps me better understand what we're trying to achieve. Here's an example process start event with file metadata (I added the file hash for the sake of for example):

{
  "agent": {
    "id": "0829aba6-34db-de36-1d42-30eac745e980",
    "type": "endpoint",
    "version": "7.10.0"
  },
  "process": {
    "name": "svchost.exe",
    "pid": 1644,
    "entity_id": "MDgyOWFiYTYtMzRkYi1kZTM2LTFkNDItMzBlYWM3NDVlOTgwLTE2NDQtMTMyNDk3MTA2OTcuNDc1OTExNTAw",
    "executable": "C:\\Windows\\System32\\svchost.exe"
  },
  "message": "Endpoint file event",
  "@timestamp": "2020-11-17T19:07:46.0956672Z",
  "file": {
    "path": "C:\\Windows\\Prefetch\\SVCHOST.EXE-AE7DB802.pf",
    "extension": "pf",
    "name": "SVCHOST.EXE-AE7DB802.pf",
    "hash": {
      "sha1": "bfb7759a67daeb65410490b4d98bb9da7d1ea2ce"
    }
  },
  "ecs": {
    "version": "1.5.0"
  },
  "data_stream": {
    "namespace": "default",
    "type": "logs",
    "dataset": "endpoint.events.file"
  },
  "host": {
    "hostname": "WinDev2001Eval",
    "os": {
      "Ext": {
        "variant": "Windows 10 Enterprise Evaluation"
      },
      "kernel": "1909 (10.0.18363.1139)",
      "name": "Windows",
      "family": "windows",
      "version": "1909 (10.0.18363.1139)",
      "platform": "windows",
      "full": "Windows 10 Enterprise Evaluation 1909 (10.0.18363.1139)"
    },
    "ip": [
      "192.168.93.145",
      "10.203.18.111",
      "fe80::4402:db6b:fd75:544",
      "127.0.0.1",
      "::1"
    ],
    "name": "WinDev2001Eval",
    "id": "5baae4dd-4abe-4ba5-a0fb-704d6e7a4328",
    "mac": [
      "00:0c:29:76:2b:0a",
      "00:ff:7b:c6:4a:64"
    ],
    "architecture": "x86_64"
  },
  "event": {
    "sequence": 186465,
    "ingested": "2020-11-17T19:08:01.275480865Z",
    "created": "2020-11-17T19:07:46.0956672Z",
    "kind": "event",
    "module": "endpoint",
    "action": "creation",
    "id": "LurtOw/d18obyz+u++++/boE",
    "category": [
      "file"
    ],
    "type": [
      "creation"
    ],
    "dataset": "endpoint.events.file"
  },
  "user": {
    "domain": "NT AUTHORITY",
    "name": "SYSTEM",
    "id": "S-1-5-18"
  }
}

Meanwhile you have the following document containing threat intel:

{
  "file": {
    "hash": {
      "sha1": "bfb7759a67daeb65410490b4d98bb9da7d1ea2ce"
    }
  },
  "threat": {
    "malware": {
      "name": "CryptoMinerX.B",
      "family": "CryptoMinerX",
      "type": "cryptominer"
    },
    "threat_actor": {
      "name": "CryptoCurrency R Us",
      "type": [
        "criminal"
      ]
    }
  }
}

What I'd like to see then, is an enriched document like so (abbreviated):

{
  "process": {
    "name": "svchost.exe",
    "pid": 1644,
    "entity_id": "MDgyOWFiYTYtMzRkYi1kZTM2LTFkNDItMzBlYWM3NDVlOTgwLTE2NDQtMTMyNDk3MTA2OTcuNDc1OTExNTAw",
    "executable": "C:\\Windows\\System32\\svchost.exe"
  },
  "message": "Endpoint file event",
  "@timestamp": "2020-11-17T19:07:46.0956672Z",
  "file": {
    "path": "C:\\Windows\\Prefetch\\SVCHOST.EXE-AE7DB802.pf",
    "extension": "pf",
    "name": "SVCHOST.EXE-AE7DB802.pf",
    "hash": {
      "sha1": "bfb7759a67daeb65410490b4d98bb9da7d1ea2ce"
    }
  },
  ...
  "threat": {
    "ioc": {
      "file": {
        "hash": {
          "sha1": "bfb7759a67daeb65410490b4d98bb9da7d1ea2ce"
        }
      }
    },
    "malware": {
      "name": "CryptoMinerX.B",
      "family": "CryptoMinerX",
      "type": "cryptominer"
    },
    "threat_actor": {
      "name": "CryptoCurrency R Us",
      "type": [
        "criminal"
      ]
    }
  }
}

DISCUSSION

Keeping the indicator in what I'll call the "actionable" ECS field (e.g. file.hash.sha1), allows me to search across all documents, including my logs and threat data. I can apply a single filter on that hash field. I can then visualize them all in a common dashboard to find events that occurred prior to receiving new threat intel.

Copying the matched IOC to the threat.ioc object in the enriched document answers the analyst question of "what specific information from this event indicates that it is a cryptominer?" It could have just as easily been the process arguments (missing from this example) or the path.

This approach may complicate using the enrich processor on ingest. Namely, if we want to enrich the document with everything under threat.*, it's not just a single lookup. You'd have to lookup, then rename some fields, which isn't awful, I think. Alternatively, the threat feed indexes could make file.hash.sha1 as an alias to threat.ioc.file.hash.sha1. This would require an alias for each indicator type field.

Digging into the specifics of using the enrich processor, how would it handle multiple IOC matches? Say if we had an IOC that was {"user": {"id": "S-1-5-18"}}, which is obviously concoted, but ideally, I'd like to see the merging of threat data into one. It might imply a criminal crypto mining threat actor, and also a nation-state threat actor that's leveraging the tool with some other attack pattern.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an IOC is a "bad IP" e.g. the essential information is the IP. But in the IOC index, should the value be captured in source.ip, destination.ip, client.ip or server.ip? All we need to do is capture "the IP" itself, then we look for it in the appropriate places, depending on the event source.

In the threat index this information could be stored in threat.ioc.ip

This goes for all "IOC types" i.m.o.
Storing them under threat.ioc in the threat index tells you the type of ioc and its value.

During enrichment you can then "add" the threat.ioc.* fields to tell the story of what was the match.

I think a more concrete list of the actual descriptors / fields we expect to need would be helpful in figuring out the way forward.

As a start, existing ECS fields to be nested under threat.ioc:

Potential other additions:

  • Registry
  • User
  • DNS
  • Process

It's an extensive list, however i do feel like all of these could be used as an IOC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During enrichment you can then "add" the threat.ioc.* fields to tell the story of what was the match.

I think this makes a lot of sense; knowing if the match was an "IP" or a "URL" or a "hash" is important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcode @peasead - based on today's discussion with @MikePaquette our next step is to move all actionable IOC information in threat intel documents to the top level, and not nest them under threat.ioc.* (as is currently proposed). we will be using threat.ioc.* for file, url etc. for enrichment use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bullet list below (file.*, file.hash.*, url.*...) is useful to give us a general idea of the landscape of the types of IOCs. I think a more concrete list of the actual descriptors / fields we expect to need would be helpful in figuring out the way forward.

threat.yml and the table should now have descriptors and fields requested.
https://github.com/elastic/ecs/blob/1347f2ba4c0ef6c00d5ffccee7fa2ad854f631d0/rfcs/text/0008/threat.yml
https://github.com/elastic/ecs/blob/1347f2ba4c0ef6c00d5ffccee7fa2ad854f631d0/rfcs/text/0008-threat-intel.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebeahan is going to look at how non-direction network artifacts should be reflected, taking ECS as-a-whole into account in addition to threat.* and the enriched document.

rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
SHolzhauer added a commit to SHolzhauer/elastic-tip that referenced this pull request Nov 17, 2020
@MikePaquette
Copy link
Contributor

@dcode I've been giving some thought to your proposal for how to use the threat.ioc.* fields. The think I like most is that it gives an implicit indication of which indicator caused the match.

This might already be solved, but I think we also need to find a way to indicate which field in the source document matched the ioc when there are more than one that could have . In your case it's clear that file.hash.sha1 matched the value of threat.ioc.file.hash.sha1, but is there a way we could determine this when trying to determine if source.ip matched threat.ioc.ip vs. destination.ip matches threat.ioc.ip ?

Anyhow, to aid my own understanding of your proposal, I've created this diagram. Can you review to see if it accurately portrays your proposal? (for now, I ignored the minimally enriched case of ingestion-time matching in this diagram.)

image

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In today's review I'm responding to a a few ongoing discussions, as well as pointing out a few small things that need adjustment.

One common thread across the discussions is that there will be two major usages of the fields. The fields used to describe IOCs in an "enrichment" index, and the fields that are appended to a live event that matches known IOCs. I think it may be useful to start fleshing out these two use cases in the "Usage" section. For example, are only certain fields meant to be copied to events, or all IOC fields?

rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
* event.risk_score _risk score provided by threat intelligence source_
* event.original _raw intelligence event_

### Using existing ECS Fields nested under Threat.ioc.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone for the thought going into this 🤔 👍

I like the symmetry of having the various fields that could be used to describe an IOC be directly the ECS fields. However I wonder if this would work in practice.

If an IOC is a "bad IP" e.g. the essential information is the IP. But in the IOC index, should the value be captured in source.ip, destination.ip, client.ip or server.ip? All we need to do is capture "the IP" itself, then we look for it in the appropriate places, depending on the event source.

We may have a similar problem with TLS or x509 certs, where in ECS we currently have fields to describe TLS exchanges, potentially mutual TLS, so tls.client.* and tls.server.*, but if e.g. a cipher or hash is bad no matter on which side it is, which field should be used to capture it in the IOC descriptor?

To get back to the symmetry of being able to filter for an IOC's indicator and get both the IOC descriptor and actual events show up, I'm not actually sure this saves users anything. In Elasticsearch, unlike SQL, we have to be explicit in which values we want to match. We can't say like in SQL "give me all events and indicators that have the same value in source.ip" for example. If you have 10 fresh new bad IPs to test for, you'd have to explicitly make an include query for ["bad ip 1", "bad ip 2", ...].

The bullet list below (file.*, file.hash.*, url.*...) is useful to give us a general idea of the landscape of the types of IOCs. I think a more concrete list of the actual descriptors / fields we expect to need would be helpful in figuring out the way forward.

@webmat webmat added the 1.8.0 label Nov 23, 2020
Copy link

@SHolzhauer SHolzhauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to existing discussions, mostly around the usage of fields.

rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
* event.risk_score _risk score provided by threat intelligence source_
* event.original _raw intelligence event_

### Using existing ECS Fields nested under Threat.ioc.*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an IOC is a "bad IP" e.g. the essential information is the IP. But in the IOC index, should the value be captured in source.ip, destination.ip, client.ip or server.ip? All we need to do is capture "the IP" itself, then we look for it in the appropriate places, depending on the event source.

In the threat index this information could be stored in threat.ioc.ip

This goes for all "IOC types" i.m.o.
Storing them under threat.ioc in the threat index tells you the type of ioc and its value.

During enrichment you can then "add" the threat.ioc.* fields to tell the story of what was the match.

I think a more concrete list of the actual descriptors / fields we expect to need would be helpful in figuring out the way forward.

As a start, existing ECS fields to be nested under threat.ioc:

Potential other additions:

  • Registry
  • User
  • DNS
  • Process

It's an extensive list, however i do feel like all of these could be used as an IOC

rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to everyone for the great discussion throughout!

After reviewing the current doc and the accompanying discussions, I've noted a few items outstanding to help continue the progress. Very possible I overlooked something 😅 .

  • Is the proposed list of threat.* fields up-to-date with the most recent discussions/decisions?

  • Including some mapping examples using the proposed fields would be very useful. For example, the classifications for the different types of IOCs that @webmat suggested ([RFC] Threat Intel - Stage 1 #1127 (comment)).

  • Can the outcome noted here be captured in the document?

  • The concern @MikePaquette raised is notable. Let's capture that as an additional concern.

rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
peasead and others added 5 commits December 10, 2020 12:40
Co-authored-by: Eric Beahan <ebeahan@gmail.com>
Co-authored-by: Eric Beahan <ebeahan@gmail.com>
Co-authored-by: Eric Beahan <ebeahan@gmail.com>
Co-authored-by: Eric Beahan <ebeahan@gmail.com>
Co-authored-by: Eric Beahan <ebeahan@gmail.com>
@ebeahan
Copy link
Member

ebeahan commented Jan 27, 2021

Two fields I listed in #1127 (review) made way into the example definitions as field reuses.

Are there any other fields listed from the proposal here that we need to capture the intent to nest?

@devonakerr devonakerr self-assigned this Jan 28, 2021
@peasead
Copy link
Contributor

peasead commented Jan 28, 2021

Two fields I listed in #1127 (review) made way into the example definitions as field reuses.

Sorry about that @ebeahan , misunderstanding on my part. I got those cleaned up.

Are there any other fields listed from the proposal here that we need to capture the intent to nest?

@ebeahan, we're zeroing in on this.

@peasead
Copy link
Contributor

peasead commented Jan 28, 2021

Are there any other fields listed from the proposal here that we need to capture the intent to nest?

@ebeahan, we're zeroing in on this.

We have a meeting next week to get a final answer, but as it sits now:

For the threat intelligence and enriched signals indices, they should both be the same.

Options:

  1. Nested
  • pros: future-proof, can query multiple fields and their relationships, more query possibilities
  • cons: performance/storage considerations for frequent modifications, Kibana doesn't currently support nested aggregations
  1. Flattened
  • pros: less performance/storage considerations for frequent modifications, supports aggregations
  • cons: less query possibilities, cannot perform range queries

@devonakerr devonakerr self-requested a review February 8, 2021 15:07
@peasead
Copy link
Contributor

peasead commented Feb 9, 2021

Are there any other fields listed from the proposal here that we need to capture the intent to nest?

@ebeahan, we're zeroing in on this.

We have a meeting next week to get a final answer, but as it sits now:

For the threat intelligence and enriched signals indices, they should both be the same.

Options:

  1. Nested
  • pros: future-proof, can query multiple fields and their relationships, more query possibilities
  • cons: performance/storage considerations for frequent modifications, Kibana doesn't currently support nested aggregations
  1. Flattened
  • pros: less performance/storage considerations for frequent modifications, supports aggregations
  • cons: less query possibilities, cannot perform range queries

Where this ended up:

  • threatintel.indicator.* (Filebeat module) will be normal field type and will be deprecated when nested field types are better supported in Kibana
  • threat.indicator.* (actual threat ECS fieldset) will be nested now and used for enriched doc

Once there is better support for nested field types in Kibana, there will be a migration to threat.indicator.*

@ebeahan
Copy link
Member

ebeahan commented Feb 11, 2021

Once there is better support for nested field types in Kibana, there will be a migration to threat.indicator.*

Do we see this development affecting the timeline for this RFC's advancement?

I imagine many users interested in threat.indicator.* fields are looking to map their own indicator sources to threat.indicator.* and then ingest those sources for use with indicator match rules. Is this something that will still be possible until the migration to threat.indicator.* happens?

Including the threat.indicator.* fields in ECS would still document the fields as soon as they are implemented in the signals indices. Yet, until we feel confident encouraging using these fields to normalize users' data, I'm worried about the confusion and experience that would result.

We're only targeting experimental support here, so I don't see these as items we must address now, though. Let's add both the decision from #1127 (comment) and these related concerns to the Concerns section. This will ensure we revisit the topics later.

Due to a bad copy/paste, these were causing the YAML to be invalid.
Documenting additional concerns.
ebeahan
ebeahan previously approved these changes Feb 16, 2021
Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

We do need to capture @devonakerr as the sponsor in the People section. please.

@ebeahan
Copy link
Member

ebeahan commented Feb 17, 2021

We do need to capture @devonakerr as the sponsor in the People section. please.

I went ahead and addressed it.

@devonakerr can you give this a final look? If all looks good, I'll update the advancement date and merge!

@devonakerr
Copy link

Yessir, apologies for the 11th hour updates.

devonakerr
devonakerr previously approved these changes Feb 17, 2021
Copy link

@devonakerr devonakerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve, sero sed serio.

@ebeahan ebeahan merged commit 3e0c861 into elastic:master Feb 18, 2021
@epixa epixa changed the title [RFC] Threat Intel - Stage 2 [RFC] Threat Intel - Stage 1 Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.