[Fleet] Improve performance of auto package policy upgrades #121639

joshdover · 2021-12-20T13:59:11Z

We have suspicion that some customers are running into some performance problems on the /api/fleet/setup endpoint caused by the "keep policies up-to-date" logic in Fleet setup. This has some opportunity for some optimization improvements which would minimize the overhead of running this logic during Fleet setup.

Current implementation

Fetch the ids for all package policies

kibana/x-pack/plugins/fleet/server/services/preconfiguration.ts

Lines 357 to 367 in 2fa5a87

    
           // Handle automatic package policy upgrades for managed packages and package with 
        
           // the `keep_policies_up_to_date` setting enabled 
        
           const allPackagePolicyIds = await packagePolicyService.listIds(soClient, { 
        
             page: 1, 
        
             perPage: SO_SEARCH_LIMIT, 
        
           }); 
        
           const packagePolicyUpgradeResults = await upgradeManagedPackagePolicies( 
        
             soClient, 
        
             esClient, 
        
             allPackagePolicyIds.items 
        
           );

For each package policy id:
- Fetch the full package policy
  
  kibana/x-pack/plugins/fleet/server/services/managed_package_policies.ts
  
  Line 39 in 2fa5a87
  
  const packagePolicy = await packagePolicyService.get(soClient, packagePolicyId);
- Fetch the package info
- Fetch the installation
- If the package has keepPoliciesUpToDate == false or is already up to date, skip this package policy
- Run the upgrade dry run for the package policy (which fetches the package policy a 2nd time)
- If dry succeeds, run the actual upgrade (which fetches the package policy a 3rd time)

Proposed implementation

Most of the decisions we need to make could be made with much fewer fetches by looking at the packages first and then using that information to filter for the package policies that need to be upgraded.

Fetch the installed packages, filtering for packages with keepPoliciesUpToDate = true and name in [managed package names]
- Maybe we already have these fetched in the previous steps of Fleet setup and could just filter in memory?
For each package that matches the query above:
- Fetch all package policies where packageName = installedPackage.name and version != installedPackage.version
- For each package policy that matches query above:
  - Attempt the dry run using the installed package and package policy that has already been fetched
  - If dry run succeed, run the actual upgrade using the installed package and package policy that has already been fetched

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-12-20T13:59:13Z

Pinging @elastic/fleet (Feature:EPM)

elasticmachine · 2021-12-20T13:59:13Z

Pinging @elastic/fleet (Team:Fleet)

joshdover · 2021-12-20T14:00:33Z

Before we do the implementation work above, we should validate with a large data set + APM instrumentation that this is indeed the area that is causing the slowdown.

juliaElastic · 2022-02-16T13:55:06Z

@joshdover I started working on this, and had a few questions:

Fetch the installed packages, filtering for packages with keepPoliciesUpToDate = true and name in [managed package names]

Isn't it enough to filter on keepPoliciesUpToDate = true? Since managed packages come with keepPoliciesUpToDate: false by default and have to be enabled by users.
upgradeManagedPackagePolicies logic is now part of ensurePreconfiguredPackagesAndPolicies, which is called from two places: setup and PUT /setup/preconfiguration API.
Do you think it is right to upgrade policies when updating preconfig? I don't find it very obvious that they are related.

Before we do the implementation work above, we should validate with a large data set + APM instrumentation that this is indeed the area that is causing the slowdown.

Is there a guide about how to use APM instrumentation to check slowness in kibana, I haven't used APM before.

joshdover · 2022-02-16T15:29:46Z

Isn't it enough to filter on keepPoliciesUpToDate = true? Since managed packages come with keepPoliciesUpToDate: false by default and have to be enabled by users.

Makes sense to me 👍

upgradeManagedPackagePolicies logic is now part of ensurePreconfiguredPackagesAndPolicies, which is called from two places: setup and PUT /setup/preconfiguration API.
Do you think it is right to upgrade policies when updating preconfig? I don't find it very obvious that they are related.

Agreed, that is probably a confusing workflow to have this API have this side-effect. cc @nchaulet and @kpollich on any input here.

Is there a guide about how to use APM instrumentation to check slowness in kibana, I haven't used APM before.

See https://www.elastic.co/guide/en/kibana/current/kibana-debugging.html#_instrumenting_with_elastic_apm

I also want to flag that this PR may have a slight impact on this work: #125788

juliaElastic · 2022-02-16T15:35:10Z

thanks, yes I noticed that pr, will rebase once merged :)

kpollich · 2022-02-16T15:40:54Z

Do you think it is right to upgrade policies when updating preconfig? I don't find it very obvious that they are related.

Hmm, thinking about this a bit I don't think this is ideal either. It makes more sense to me to have upgrades happen only during setup rather than during preconfiguration updates. I'm not 100% sure where we call preconfiguration update rather than setup currently, though, so I might be missing a use case.

juliaElastic · 2022-02-17T09:04:47Z

I raised a draft pr, will continue with tests.

I tried to use APM instrumentation (following the guide), but currently it is not working for 8+ versions, because of the default policies change, fix is in progress: elastic/apm-integration-testing#1435
I will get back to this when fixed.

Apart from that, I did some manual time measurements of upgradeManagedPackagePolicies, and noticed that the getPackageInfo call is quite slow (about 100ms / call on average), which was called once for each package policy previously. This added up with many package policies e.g. 5 policies - 500ms, 10 policies - 1000ms.

Measuring with this improvement merged by @joshdover yesterday, which removed getPackageInfo calls, I only see the upgradeManagedPackagePolicies logic taking 10-20ms.
The improvements of this issue does't seem to make too much difference (10-20ms still). These times were measured with 3-11 package policies with 4 managed packages (apm, system, fleet_server, elastic_agent), without an actual upgrade needed.

The actual policy upgrade takes more time as expected, so far the times were around 4-5000ms for system package.

I think the refactor still make sense to make the logic simpler and more readable.

joshdover · 2022-02-17T13:49:22Z

I tried to use APM instrumentation (following the guide), but currently it is not working for 8+ versions, because of the default policies change, fix is in progress: elastic/apm-integration-testing#1435 I will get back to this when fixed.

You can also use your own Cloud deployment if you'd like, this is how I do it personally. It's nice to have a test cluster that you use over the course of several months with some real data to play around with and can make whatever Fleet changes you'd like to it without affecting other employees.

Measuring with this improvement merged by @joshdover yesterday, which removed getPackageInfo calls, I only see the upgradeManagedPackagePolicies logic taking 10-20ms. The improvements of this issue does't seem to make too much difference (10-20ms still). These times were measured with 3-11 package policies with 4 managed packages (apm, system, fleet_server, elastic_agent), without an actual upgrade needed.

Well that's a nice side-effect of that bug fix. Thanks for taking the time to measure it. I also suspected this was the main issue as it seemed to be the thing we were querying repeatedly unnecessarily.

I think the refactor still make sense to make the logic simpler and more readable.

+1, thanks for continuing forward with this.

joshdover added performance Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project Team:Fleet Team label for Observability Data Collection Fleet team v7.17.0 labels Dec 20, 2021

This was referenced Jan 6, 2022

[Fleet] Block Kibana startup for Fleet setup completion #120616

Open

[Fleet] Evaluate Fleet page load performance and steps to improve #118751

Closed

jen-huang added v8.2.0 and removed v7.17.0 labels Jan 19, 2022

joshdover mentioned this issue Feb 11, 2022

Allow Fleet to complete package upgrade before Kibana server is ready #108993

Closed

juliaElastic self-assigned this Feb 15, 2022

juliaElastic mentioned this issue Feb 17, 2022

[Fleet] refactor auto upgrade package policies logic #125909

Merged

1 task

juliaElastic closed this as completed in #125909 Feb 21, 2022

joshdover changed the title ~~[Fleet] Improve performance of auto package upgrades~~ [Fleet] Improve performance of auto package policy upgrades Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Improve performance of auto package policy upgrades #121639

[Fleet] Improve performance of auto package policy upgrades #121639

joshdover commented Dec 20, 2021

elasticmachine commented Dec 20, 2021

elasticmachine commented Dec 20, 2021

joshdover commented Dec 20, 2021

juliaElastic commented Feb 16, 2022 •

edited

Loading

joshdover commented Feb 16, 2022

juliaElastic commented Feb 16, 2022

kpollich commented Feb 16, 2022

juliaElastic commented Feb 17, 2022 •

edited

Loading

joshdover commented Feb 17, 2022

[Fleet] Improve performance of auto package policy upgrades #121639

[Fleet] Improve performance of auto package policy upgrades #121639

Comments

joshdover commented Dec 20, 2021

Current implementation

Proposed implementation

elasticmachine commented Dec 20, 2021

elasticmachine commented Dec 20, 2021

joshdover commented Dec 20, 2021

juliaElastic commented Feb 16, 2022 • edited Loading

joshdover commented Feb 16, 2022

juliaElastic commented Feb 16, 2022

kpollich commented Feb 16, 2022

juliaElastic commented Feb 17, 2022 • edited Loading

joshdover commented Feb 17, 2022

juliaElastic commented Feb 16, 2022 •

edited

Loading

juliaElastic commented Feb 17, 2022 •

edited

Loading