[Security Solution][Telemetry] Concurrent telemetry requests #73558

michaelolo24 · 2020-07-28T21:30:47Z

Summary

This PR handles a couple things:

[Deduplication]: Due to cloned VM's sharing the same host id, I've updated the de-duplication to be based on a compound id of [hostId]-[hostName]. All cloned VM's prior to this would be treated as just one host. It is still possible to have missed clones if the hostname is the same across vm's, but this is currently the known best alternative without adding potentially false positives.
[Performance & error handling]: I don't expect that we'll run into any performance issues atm, but to help guard against that, I've set up the two sides of our telemetry to run concurrently and independently. In the event of any of our telemetry branches (endpoint & detections) fail for one reason or another, this will still allow us to get data back if the other branches are successful.

Checklist

Unit or functional tests were updated or added to match the most common scenarios

elasticmachine · 2020-07-28T21:30:49Z

Pinging @elastic/endpoint-data-visibility-team (Team:Endpoint Data Visibility)

michaelolo24 · 2020-07-28T21:31:43Z

x-pack/plugins/security_solution/server/usage/endpoints/index.ts

-            latestEndpointEvent,
-            lastCheckin,
-            dailyActiveCount
+    try {


This looks more aggressive than it is. I just moved the contents of the for loop into a try catch

rylnd · 2020-07-28T22:46:04Z

x-pack/plugins/security_solution/server/usage/collector.ts

 import { EndpointUsage, getEndpointTelemetryFromFleet } from './endpoints';

 export type RegisterCollector = (deps: CollectorDependencies) => void;
 export interface UsageData {
-  detections: DetectionsUsage;
+  detections: DetectionsUsage | {};


I think this will always be DetectionsUsage, right?

Yea, forgot to remove this when I updated. It's out now. Thanks!

michaelolo24 · 2020-07-29T00:36:30Z

@elasticmachine merge upstream

jonathan-buttner

@michaelolo24 this is labeled with 7.9 does it need to be backport to 7.9? Or is 7.10 fine?

jonathan-buttner · 2020-07-29T15:51:41Z

x-pack/plugins/security_solution/server/usage/collector.ts

@@ -76,9 +76,14 @@ export const registerCollector: RegisterCollector = ({
    isReady: () => kibanaIndex.length > 0,
    fetch: async (callCluster: LegacyAPICaller): Promise<UsageData> => {
      const savedObjectsClient = await getInternalSavedObjectsClient(core);
+      const [detections, endpoints] = await Promise.allSettled([


If one of the requests fails should we log any errors? or will that already happen?

It doesn't already happen, we can throw a log in here, but

I don't know what the likeliness of us getting those logs later on since telemetry is running in the background. Like do we contact a user if for whatever reason we don't get telemetry back?

I worry about logging any errors in case there's PII in there, which I don't expect, but would rather avoid that potential

That makes sense. Might be worth asking some others on the telemetry team what the general guidance is.

I guess my thinking was if we release a new stack version and we notice that we're not getting any telemetry for some reason, I believe we collect the logs of our cloud deployments so we could poke around and and least see if one of the requests is failing or something 🤷 .

Yea, as far as I can tell there aren't any patterns it sounds like. We'll just see an empty object for now if anything fails, but I think we can work with the telemetry team put in some better logic here for 7.10

jonathan-buttner · 2020-07-29T15:54:50Z

x-pack/plugins/security_solution/server/usage/endpoints/index.ts

        }
      }
+    } catch (error) {


Should we log an error/warning here?

see previous comment :). I'll also check with the Telemetry team to see how they handle errors to confirm

bkimmel · 2020-07-29T15:59:16Z

x-pack/plugins/security_solution/server/usage/collector.ts

@@ -76,9 +76,14 @@ export const registerCollector: RegisterCollector = ({
    isReady: () => kibanaIndex.length > 0,
    fetch: async (callCluster: LegacyAPICaller): Promise<UsageData> => {
      const savedObjectsClient = await getInternalSavedObjectsClient(core);
+      const [detections, endpoints] = await Promise.allSettled([


ℹ️ nice use of .allSettled

bkimmel · 2020-07-29T16:00:31Z

x-pack/plugins/security_solution/server/usage/detections/detections_helpers.ts

@@ -23,7 +23,7 @@ interface DetectionsMetric {

 const isElasticRule = (tags: string[]) => tags.includes(`${INTERNAL_IMMUTABLE_KEY}:true`);

-const initialRulesUsage: DetectionRulesUsage = {
+export const initialRulesUsage: DetectionRulesUsage = {


❔ Doc comment on exports

will add, thanks

bkimmel · 2020-07-29T16:06:40Z

x-pack/plugins/security_solution/server/usage/endpoints/index.ts

+        uniqueHostIds.add(host.id);
+        const agentId = elastic?.agent?.id;
+        osTracker = updateEndpointOSTelemetry(os, osTracker);
+


❔ Could you add a comment here about where the error was throwing? Moving the try up makes it safer, but maybe harder to understand where exceptions could throw from.

bkimmel · 2020-07-29T16:07:08Z

x-pack/plugins/security_solution/server/usage/detections/detections_helpers.ts

@@ -34,7 +34,7 @@ const initialRulesUsage: DetectionRulesUsage = {
  },
 };

-const initialMlJobsUsage: MlJobsUsage = {
+export const initialMlJobsUsage: MlJobsUsage = {


❔ Docs on exports

bkimmel

Left a few ❔ s

michaelolo24 · 2020-07-30T18:34:25Z

x-pack/plugins/security_solution/server/usage/endpoints/index.ts

+      const { last_checkin: lastCheckin, local_metadata: localMetadata } = metadataAttributes;
+      const { host, os, elastic } = localMetadata as AgentLocalMetadata;
+
+      // Although not perfect, the goal is to dedupe hosts to get the most recent data for a host


fyi @jonathan-buttner per our conversation

jonathan-buttner · 2020-07-30T18:47:14Z

x-pack/plugins/security_solution/server/usage/endpoints/index.ts

+      // Although not perfect, the goal is to dedupe hosts to get the most recent data for a host
+      // An agent re-installed on the same host will have all the same id, name, and kernel details
+      // A cloned VM will have the same id, but "may" have the same name and kernel, but it's really up to the user.
+      const compoundUniqueId = `${host?.id}-${host?.hostname}-${os?.kernel}`;


hmm actually thinking about this more, what would happen in the scenario where a user updates their computer to a new OS version? I think the os.kernel would probably change right? I think we'd want to treat that telemetry information as the same user right?

I wonder if we should just stick with ${host?.id}-${host?.hostname} ?

True, good point. Yea, the OS information would be tricky in update scenarios. I'll simplify it to just those two. Thanks!

kibanamachine · 2020-07-30T20:43:44Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: 5577ce4

Build metrics

✅ unchanged

History

💚 Build #65589 succeeded 537aa12
💚 Build #65308 succeeded d3f9042369807329cfd00b39b7734049e087e3a2
💚 Build #65045 succeeded 8d8c6beed94d21942b1ed97cd498a08706539ace
💔 Build #65003 failed 2422ed5e5f79cfcf472f59a2b0fa5d5a95d1412e
💔 Build #64960 failed ab764a794f58205076a99f0f447085f2481b4afa

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

…#73558)

…73558) (#73894)

…73558) (#73893)

* master: (54 commits) [ML] Migrate to React BrowserRouter and Kibana provided History. (elastic#71941) [Discover] Improve saveSearch functional test handling (elastic#73626) [Metrics UI] Fix all threshold alert conditions disappearing due to alert prefill (elastic#73708) [Metrics UI] Fix alert previews of ungrouped alerts (elastic#73735) [SIEM] Fixes "include building block button" to operate (elastic#73900) [Metrics UI] Fix alert management to open without refresh (elastic#73739) [Security Solution][Lists] - Tests cleanup and remove unnecessary import (elastic#73865) [Ingest Management] main branch uses epr-snapshot. Others production (elastic#73555) [Canvas][tech-debt] Fix SVG not shrinking vertically properly (elastic#73867) [Maps] upgrade turf (elastic#73816) [Security Solution][Telemetry] Concurrent telemetry requests (elastic#73558) [Security Solution][Exceptions] - Update how nested entries are displayed in exceptions viewer (elastic#73745) [Security Solution][Exceptions] Adds autocomplete workaround for .text fields (elastic#73761) [Metrics UI] Fix previewing of No Data results (elastic#73753) Closes elastic#72914 by hiding anomaly detection settings links when the ml plugin is disabled. (elastic#73638) [Ingest Manager] Fix config selection in enrollment flyout from config list page (elastic#73833) [DOCS] Fixes typo in Alerting actions (elastic#73756) [APM] fixes linking errors to ML and Discover (elastic#73758) Handle promise rejections when building artifacts (elastic#73831) [Security Solution][Detections] Change from sha1 to sha256 (elastic#73741) ...

* master: (38 commits) [Discover] Context unskip date nanos functional tests (elastic#73781) [ML] Migrate to React BrowserRouter and Kibana provided History. (elastic#71941) [Discover] Improve saveSearch functional test handling (elastic#73626) [Metrics UI] Fix all threshold alert conditions disappearing due to alert prefill (elastic#73708) [Metrics UI] Fix alert previews of ungrouped alerts (elastic#73735) [SIEM] Fixes "include building block button" to operate (elastic#73900) [Metrics UI] Fix alert management to open without refresh (elastic#73739) [Security Solution][Lists] - Tests cleanup and remove unnecessary import (elastic#73865) [Ingest Management] main branch uses epr-snapshot. Others production (elastic#73555) [Canvas][tech-debt] Fix SVG not shrinking vertically properly (elastic#73867) [Maps] upgrade turf (elastic#73816) [Security Solution][Telemetry] Concurrent telemetry requests (elastic#73558) [Security Solution][Exceptions] - Update how nested entries are displayed in exceptions viewer (elastic#73745) [Security Solution][Exceptions] Adds autocomplete workaround for .text fields (elastic#73761) [Metrics UI] Fix previewing of No Data results (elastic#73753) Closes elastic#72914 by hiding anomaly detection settings links when the ml plugin is disabled. (elastic#73638) [Ingest Manager] Fix config selection in enrollment flyout from config list page (elastic#73833) [DOCS] Fixes typo in Alerting actions (elastic#73756) [APM] fixes linking errors to ML and Discover (elastic#73758) Handle promise rejections when building artifacts (elastic#73831) ...

michaelolo24 added Feature:Telemetry v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:Endpoint Data Visibility Team managing the endpoint resolver v7.9.0 labels Jul 28, 2020

michaelolo24 requested a review from rylnd July 28, 2020 21:30

michaelolo24 requested review from a team as code owners July 28, 2020 21:30

michaelolo24 commented Jul 28, 2020

View reviewed changes

rylnd reviewed Jul 28, 2020

View reviewed changes

michaelolo24 force-pushed the telemetry-error-handling branch from ab764a7 to 2422ed5 Compare July 28, 2020 22:49

jonathan-buttner reviewed Jul 29, 2020

View reviewed changes

bkimmel reviewed Jul 29, 2020

View reviewed changes

bkimmel approved these changes Jul 29, 2020

View reviewed changes

jonathan-buttner approved these changes Jul 29, 2020

View reviewed changes

michaelolo24 force-pushed the telemetry-error-handling branch from 8d8c6be to d3f9042 Compare July 29, 2020 18:01

michaelolo24 added 5 commits July 30, 2020 12:56

gracefully handle telemetry failures

3cf7b66

run telemetry concurrently

3a82039

remove errant type

93397b1

added comments

4ac318b

improve deduping strategy

537aa12

michaelolo24 force-pushed the telemetry-error-handling branch from d3f9042 to 537aa12 Compare July 30, 2020 16:56

michaelolo24 commented Jul 30, 2020

View reviewed changes

jonathan-buttner reviewed Jul 30, 2020

View reviewed changes

not using os details in id, changes on os updates

5577ce4

michaelolo24 merged commit 14355ab into elastic:master Jul 30, 2020

michaelolo24 deleted the telemetry-error-handling branch July 30, 2020 21:44

michaelolo24 mentioned this pull request Jul 30, 2020

[7.x] [Security Solution][Telemetry] Concurrent telemetry requests (#73558) #73893

Merged

michaelolo24 added a commit to michaelolo24/kibana that referenced this pull request Jul 30, 2020

[Security Solution][Telemetry] Concurrent telemetry requests (elastic…

19de51b

…#73558)

michaelolo24 mentioned this pull request Jul 30, 2020

[7.9] [Security Solution][Telemetry] Concurrent telemetry requests (#73558) #73894

Merged

michaelolo24 added a commit to michaelolo24/kibana that referenced this pull request Jul 30, 2020

[Security Solution][Telemetry] Concurrent telemetry requests (elastic…

7caeca9

…#73558)

michaelolo24 added a commit that referenced this pull request Jul 31, 2020

[7.9] [Security Solution][Telemetry] Concurrent telemetry requests (#…

dc5fca2

…73558) (#73894)

michaelolo24 added a commit that referenced this pull request Jul 31, 2020

[7.x] [Security Solution][Telemetry] Concurrent telemetry requests (#…

6095aa1

…73558) (#73893)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution][Telemetry] Concurrent telemetry requests #73558

[Security Solution][Telemetry] Concurrent telemetry requests #73558

michaelolo24 commented Jul 28, 2020 •

edited

Loading

elasticmachine commented Jul 28, 2020

michaelolo24 Jul 28, 2020

rylnd Jul 28, 2020

michaelolo24 Jul 28, 2020

michaelolo24 commented Jul 29, 2020

jonathan-buttner left a comment

jonathan-buttner Jul 29, 2020

michaelolo24 Jul 29, 2020

jonathan-buttner Jul 29, 2020

michaelolo24 Jul 30, 2020

jonathan-buttner Jul 29, 2020

michaelolo24 Jul 29, 2020

bkimmel Jul 29, 2020

bkimmel Jul 29, 2020

michaelolo24 Jul 29, 2020

bkimmel Jul 29, 2020

michaelolo24 Jul 30, 2020

bkimmel Jul 29, 2020

michaelolo24 Jul 30, 2020

bkimmel left a comment

michaelolo24 Jul 30, 2020

jonathan-buttner Jul 30, 2020

michaelolo24 Jul 30, 2020

kibanamachine commented Jul 30, 2020

[Security Solution][Telemetry] Concurrent telemetry requests #73558

[Security Solution][Telemetry] Concurrent telemetry requests #73558

Conversation

michaelolo24 commented Jul 28, 2020 • edited Loading

Summary

Checklist

elasticmachine commented Jul 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelolo24 commented Jul 29, 2020

jonathan-buttner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkimmel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kibanamachine commented Jul 30, 2020

💚 Build Succeeded

Build metrics

History

michaelolo24 commented Jul 28, 2020 •

edited

Loading