Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EP Meta Telemetry Perf #104396

Merged
merged 25 commits into from
Jul 19, 2021
Merged

EP Meta Telemetry Perf #104396

merged 25 commits into from
Jul 19, 2021

Conversation

pjhampton
Copy link
Contributor

@pjhampton pjhampton commented Jul 6, 2021

Summary

Refactoring some parts of the EP Meta telemetry + making performance improvements.

Protections + Endpoint are now happy with the payload shape. We will send this back daily to make improvements to endpoint/protections and identify UX issues or bugs.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@pjhampton pjhampton added Feature:Telemetry v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v7.14.0 auto-backport Deprecated - use backport:version if exact versions are needed labels Jul 6, 2021
@pjhampton pjhampton self-assigned this Jul 6, 2021
@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@spalger spalger added the v7.15.0 label Jul 7, 2021
@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@pjhampton pjhampton marked this pull request as ready for review July 9, 2021 13:12
@pjhampton pjhampton requested a review from a team as a code owner July 9, 2021 13:12
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@donaherc donaherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but will definitely defer to other security app devs for additional feedback

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comments! Left some ideas for avoiding the casting.

* and buckets them by Endpoint Agent id and sorts by the top hit. The EP agent will
* report its metrics once per day OR every time a policy change has occured. If
* a metric document(s) exists for an EP agent we map to fleet agent and policy
*/
const { body: endpointMetricsResponse } = (endpointData.endpointMetrics as unknown) as {
Copy link
Contributor

@jonathan-buttner jonathan-buttner Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just responding to your question from slack here, I know you didn't change this but yeah if you have time to make a few changes here that'd be great.

I believe it's possible for endpointMetricsResponse to be undefined based on the return value of the this.fetchEndpointData() call. Below I think we can change the if to if (endpointMetricsResponse?.aggregation to address that

I think if we could remove the cast to unknown that'd be awesome. I think we can achieve that by adding something like this:

interface MetricsType {
  key: string;
  doc_count: number;
  latest_metrics: EndpointMetricHits;
}

const endpointAgentsAggs = endpointMetricsResponse.aggregations
      .endpoint_agents as AggregationsMultiBucketAggregate<MetricsType>;

    const endpointMetrics = endpointAgentsAggs.buckets.map((epMetrics) => {
      return {
        endpoint_agent: epMetrics.latest_metrics.hits.hits[0]._source.agent.id,
        endpoint_version: epMetrics.latest_metrics.hits.hits[0]._source.agent.version,
        endpoint_metrics: epMetrics.latest_metrics.hits.hits[0]._source,
      };

As a side note it looks like EndpointMetricsAggregation might need total I think that is already typed as part of the SearchResponse type, so we might be able to get rid of that part.

export interface EndpointMetricsAggregation {
  hits: {
    total: { value: number };
  };

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

const packagePolicies = agentPolicy?.package_policies as PackagePolicy[];

packagePolicies
.map((pPolicy) => pPolicy as PolicyData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the map just to cast the values, or should this be a filter of any null values? If the former could we just remove the map like below? Also, can the pPolicy.inputs array be empty or the config null instead of undefined?

packagePolicies
  .forEach((pPolicy as PolicyData) => {
    if (pPolicy.inputs[0].config !== undefined) { // <-- Also, can this be null too?
      pPolicy.inputs.forEach((input) => {
        if (
          input.type === FLEET_ENDPOINT_PACKAGE &&
          input.config !== undefined &&
          policyInfo !== undefined
        ) {
          endpointPolicyCache.set(policyInfo, pPolicy);
        }
      });
    }
  });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the map just to cast the values, or should this be a filter of any null values?

It is the former - I am just casting. There shouldn't be any null values (but anything is possible).

Also, can the pPolicy.inputs array be empty or the config null instead of undefined?

I haven't seen it - as these are package policies instead of agent policies I think they should always have an input. But I'm running with the assumption that it is possible.

failedPolicy?._source.Endpoint.policy.applied.artifacts.global.version,
status: failedPolicy?._source.Endpoint.policy.applied.status,
actions: failedPolicy?._source.Endpoint.policy.applied.actions
.map((action) => (action.status !== 'success' ? action : null))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a reduce combining these two be easier? So:

failedPolicy?._source.Endpoint.policy.applied.actions.reduce((memo, action) => {
	if (action.status !== 'success') memo.push(action);
	return memo;
}, []);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great suggestion, but I prefer how my implementation reads by functional chaining. Hope that is ok :-)

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @pjhampton

@pjhampton pjhampton merged commit 75c573a into master Jul 19, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jul 19, 2021
* Add comments for other developers.

* Move OS infomation into meta key.

* Refmt endpoint metrics.

* Add helper funcs to batch sending.

* Add test to ensure opt in status.

* Add helpers test.

* Finish reshaping the document based on feedback.

* Add better type safety. Add policy package version to output.

* Fix sender implementation for aggregating EP datastreams.

* Fix type issues.

* Fix cadence inference + miss default agent id.

* Dynamically control search ranges for metrics + policy responses.

* Set back to 24h.

* Add comment for ignoring the default policy id.

* explicitly type the sub agg search query.

* Improve type safety.

* Add additional type safety + try/catch the last block.

* Remove unneeded optional chaining.

* Destructure host metrics.

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jul 19, 2021
* Add comments for other developers.

* Move OS infomation into meta key.

* Refmt endpoint metrics.

* Add helper funcs to batch sending.

* Add test to ensure opt in status.

* Add helpers test.

* Finish reshaping the document based on feedback.

* Add better type safety. Add policy package version to output.

* Fix sender implementation for aggregating EP datastreams.

* Fix type issues.

* Fix cadence inference + miss default agent id.

* Dynamically control search ranges for metrics + policy responses.

* Set back to 24h.

* Add comment for ignoring the default policy id.

* explicitly type the sub agg search query.

* Improve type safety.

* Add additional type safety + try/catch the last block.

* Remove unneeded optional chaining.

* Destructure host metrics.

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.14
7.x

The backport PRs will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Jul 19, 2021
* Add comments for other developers.

* Move OS infomation into meta key.

* Refmt endpoint metrics.

* Add helper funcs to batch sending.

* Add test to ensure opt in status.

* Add helpers test.

* Finish reshaping the document based on feedback.

* Add better type safety. Add policy package version to output.

* Fix sender implementation for aggregating EP datastreams.

* Fix type issues.

* Fix cadence inference + miss default agent id.

* Dynamically control search ranges for metrics + policy responses.

* Set back to 24h.

* Add comment for ignoring the default policy id.

* explicitly type the sub agg search query.

* Improve type safety.

* Add additional type safety + try/catch the last block.

* Remove unneeded optional chaining.

* Destructure host metrics.

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Pete Hampton <pjhampton@users.noreply.github.com>
kibanamachine added a commit that referenced this pull request Jul 19, 2021
* Add comments for other developers.

* Move OS infomation into meta key.

* Refmt endpoint metrics.

* Add helper funcs to batch sending.

* Add test to ensure opt in status.

* Add helpers test.

* Finish reshaping the document based on feedback.

* Add better type safety. Add policy package version to output.

* Fix sender implementation for aggregating EP datastreams.

* Fix type issues.

* Fix cadence inference + miss default agent id.

* Dynamically control search ranges for metrics + policy responses.

* Set back to 24h.

* Add comment for ignoring the default policy id.

* explicitly type the sub agg search query.

* Improve type safety.

* Add additional type safety + try/catch the last block.

* Remove unneeded optional chaining.

* Destructure host metrics.

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Pete Hampton <pjhampton@users.noreply.github.com>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jul 20, 2021
…y-show-migrate-to-authzd-users

* 'master' of github.com:elastic/kibana: (187 commits)
  Space management page UX improvements (elastic#100448)
  [Reporting] Unskip flaky test when downloading CSV with "no data" (elastic#105252)
  Update dependency @elastic/charts to v33 (master) (elastic#105633)
  [Observability RAC] Improve alerts table columns (elastic#105446)
  Introduce `preboot` lifecycle stage (elastic#103636)
  [Security Solution] Invalid kql query timeline refresh bug (elastic#105525)
  skip flaky suite (elastic#106121)
  [Security Solution][Endpoint] Fix UI inconsistency between isolation forms and remove display of Pending isolation statuses (elastic#106118)
  docs: APM RUM Source map API (elastic#105332)
  [CTI] Adds indicator match rule improvements (elastic#97310)
  [Security Solution] update text for Isolation action submissions (elastic#105956)
  EP Meta Telemetry Perf (elastic#104396)
  [Metrics UI] Drop partial buckets from ALL Metrics UI queries (elastic#104784)
  Remove beta admonitions for Fleet docs (elastic#106010)
  [Observability RAC] Remove indexing of rule evaluation documents (elastic#104970)
  Parameterize migration test for kibana version (elastic#105417)
  [Alerting] Allow rule to execute if the value is 0 and that mets the condition (elastic#105626)
  [ML] Fix Index data visualizer sometimes shows wrong doc count for saved searches (elastic#106007)
  [Security Solution] UX fixes for Policy page and Case Host Isolation comment (elastic#106027)
  [Security Solution]Memory protection configuration card for policies integration. (elastic#101365)
  ...

# Conflicts:
#	x-pack/plugins/reporting/public/management/report_listing.test.tsx
#	x-pack/plugins/reporting/public/management/report_listing.tsx
@spalger spalger deleted the pjhampton/ep-telemetry-perf branch May 8, 2022 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v7.14.0 v7.15.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants