Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Upgrading a package should trigger a new POLICY CHANGE for agents #92612

Closed
nchaulet opened this issue Feb 24, 2021 · 31 comments
Closed
Assignees
Labels
design research Team:Fleet Team label for Observability Data Collection Fleet team v7.14.0

Comments

@nchaulet
Copy link
Member

Description

Currently when we update a package, agents will not have the latest package until the policy integration is updated,
We should trigger a policy change.

Open questions

What happens if a new variables is introduced in the package, and we are missing it?

@nchaulet nchaulet added v8.0.0 Team:Fleet Team label for Observability Data Collection Fleet team v7.13.0 labels Feb 24, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@simitt
Copy link
Contributor

simitt commented Feb 24, 2021

I'd like to add following observations (with the APM example)

(1) Inconsistent behavior of package versions:

  • When an outdated integration is installed and one navigates to Fleet/Integrations/APM - Add Elastic APM, the outdated package is available to be installed to a policy.

Screenshot 2021-02-24 at 17 29 04

  • When navigating to the Integration installation via the policy UI _Fleet/Policies/ - Add Integration, the latest package is available to be installed to a policy.

Screenshot 2021-02-24 at 17 29 37

There is also the option to only install the package without adding it to an policy. If this pre-installed package gets ignored when the integration is added to a policy I am not clear on the use case for this.

(2) Upgrading the package did not change the integration in the policy (might be the same issue mentioned in the description, I am not sure). I could not find a way of updating the package in the policy, editing it did not lead to an updated policy. Removing and reinstalling was the only way of updating. According to offline conversation with @ruflin editing the policy should trigger an update.
Screenshot 2021-02-24 at 17 41 23

Thanks @skh for navigating me through the current flow.

@skh
Copy link
Contributor

skh commented Feb 24, 2021

I think the behavior described by @simitt (good catch!) in #92612 (comment) has the same reason as #86268. A bit of background is also described in #81995 .

@nchaulet
Copy link
Member Author

Upgrading the package did not change the integration in the policy (might be the same issue mentioned in the description, I am not sure). I could not find a way of updating the package in the policy, editing it did not lead to an updated policy. Removing and reinstalling was the only way of updating. According to offline conversation with @ruflin editing the policy should trigger an update.

Yes this is the issue here, updating a package should trigger an update of related policies

For the first issue, seems weird, but I think it's probably related to the same problem we store the integration for a policy the first time it's added

@ruflin
Copy link
Contributor

ruflin commented Feb 25, 2021

There is also the option to only install the package without adding it to an policy. If this pre-installed package gets ignored when the integration is added to a policy I am not clear on the use case for this.

This definitively sounds like a bug.

@EricDavisX
Copy link
Contributor

I'd like to encourage us to review this as a feature. We would ideally want to leave the product in a state where a user can test out changes before rolling it out to all Agents. Just my 2 cents.

@simitt
Copy link
Contributor

simitt commented Feb 25, 2021

I believe we need to review the whole update process, with respect to the purpose of installed 'global' packages (not installed to a policy) and whether or not they should be considered when installing integrations to policies. It should be clear whether an integration is connected to the global package or not. There should not be a difference in behavior when installing an integration from the Integrations or from the policy page. If we really want this, then the difference must be clearly understandable, and not subtle as it is now.

I see the point in not automatically updating policy integrations when a package is generally updated (although I would question why we would have a global package then). But I find it even more confusing when a policy integration automatically gets updated on the first edit after the global packages was updated, especially if the person updating the global package is not the same person who is updating the policy integration.

Most probably I am missing some context, but what is unclear to me is:

  • the purpose of the global package
  • the potential impact of different versions on ES setup; for example: does a global package update also update the ingest pipelines and ILM policies, which then get applied to all data streams, independend of the version used in the policy. Does this actually bring integration policies into a half-updated state?
  • what happens in case the global package is updated to the next major version, indicating breaking changes? Can it just be applied to integration policies on update? Enrolled Elastic Agents would potentially also need to be updated to a newer stack version before the new package can be applied?

@nchaulet would you prefer if I moved these discussions to a dedicated issue? I felt they are related to your bug report, and some of these questions need to be answered before fixing this, but also don't want to completely hijack this issue.

@skh
Copy link
Contributor

skh commented Feb 25, 2021

the purpose of the global package

I think we might have a misunderstanding here -- there is no global package, or you could say there is only the global package. There is always only one version of the package installed. The first time a package is used in a policy, it is installed, and can then be used by other policies as well.

In particular, there is currently no way to install different versions of the same package. If different policies refer to different versions of a package, those referring to non-installed (outdated) versions may probably experience problems.

@simitt
Copy link
Contributor

simitt commented Feb 25, 2021

When an integration is installed to a new policy, the latest available package information for this integration is shown. When installing it and an older version is already globally installed - what is the expected behavior in this case?

@ph
Copy link
Contributor

ph commented Feb 25, 2021

I've added this issue to discuss in our sync, as @EricDavisX mentioned we need to be careful in the changes in.

@jen-huang
Copy link
Contributor

FYI this issue to show package version number in agent policy details is related, though it's more of a proposed workaround for current behavior: #90308

@nchaulet
Copy link
Member Author

nchaulet commented Mar 4, 2021

As discussed in the weekly

The Different upgrade scenarios we have

  1. Auto upgrade of default packages system, endpoint
    1. When the user visit the fleet app, this call the setup endpoint that will trigger an update
    2. The update will update the asset but not update the current agent policies that use these integrations
    3. The endpoint team write saved object migrations , the next time some one update the agent policy agent will get the new package agent policy
  2. Manual upgrade of the package
    1. From the settings page of the integration details
    2. This will install the new assets but not update the current agent policies

I think upgrade the associated agent policies in both case should solve most of our problems.

Also there is an issue we have a mismatch where add integration from the integration page is not using the installed package to build the integration UI.

@jen-huang
Copy link
Contributor

I think upgrade the associated agent policies in both case should solve most of our problems.

How do you propose we upgrade the associated policies? Will this be done automatically after a package (auto/manual) upgrades? If automatic, this could have cascading effects described in #90308.

Also there is an issue we have a mismatch where add integration from the integration page is not using the installed package to build the integration UI.

Can you elaborate on this bug?

@ruflin
Copy link
Contributor

ruflin commented Mar 5, 2021

@nchaulet I think we need a bit more detailed analysis on what happens if we automatically upgrade the policies related to it. Upgrading a policy could mean having suddenly new default datasets enabled which was not intentional and update 100k Agents. Please dive more into the details for this and the different scenarios.

@skh
Copy link
Contributor

skh commented Mar 5, 2021

Can you elaborate on this bug?

@jen-huang I think this refers to what @simitt describes in #92612 (comment) and I suspect it has the same reason as #86268. A bit of background is also described in #81995 .

@simitt
Copy link
Contributor

simitt commented Mar 5, 2021

@nchaulet I am also wondering if the upgrade process needs to be the same for every package. For APM we will align the package version with the stack version and we (cc @axw ) even have ongoing discussions whether the package version could always be kept in sync with the Kibana version (maybe by even bundling the APM Server package with Kibana?). This would be helpful for the curated UI as it could always rely on the latest fields being in the template.

Upgrading a policy could mean having suddenly new default datasets enabled which was not intentional and update 100k Agents.

For packages like APM I don't see any harm in automatically upgrading all agent policies when the package is upgraded. Additional fields will only be added as soon as the APM Server binary is also upgraded, which happens when the Elastic Agent is upgraded, not the package. Additionaly pipelines or potentially changed ILM policies would be automatically applied, but I don't think that's a drawback, as they should not be breaking.

@EricDavisX
Copy link
Contributor

Do we think that any actions we'll take here are going to impact or significantly impact the user's flow to get new packages in use on hosts? I have a task to document that work-flow and am curious if we expect any outcomes from this in the immediate coming iteration, thanks so much! @ph

@ruflin
Copy link
Contributor

ruflin commented Apr 19, 2021

@ph @nchaulet I suggest we move this to 7.14? I think it is very important issue to fix but probably not a "low hanging fruit".

@ruflin ruflin added the bug Fixes for quality problems that affect the customer experience label Apr 19, 2021
@ruflin ruflin assigned jen-huang and unassigned nchaulet Apr 19, 2021
@jen-huang jen-huang removed the v7.13.0 label Apr 26, 2021
@jen-huang jen-huang removed the v8.0.0 label May 11, 2021
@jen-huang jen-huang added research and removed bug Fixes for quality problems that affect the customer experience labels May 25, 2021
@EricDavisX
Copy link
Contributor

@jen-huang do you want the 7.14 label on this? I've added so it is captured for discussion in near term, or so we can agree to postpone it if need be

@mukeshelastic
Copy link

@mostlyjason Have you had a chance to look into this issue and help clarify expected behaviors for package upgrade?

@jen-huang
Copy link
Contributor

@mukeshelastic @mostlyjason I'm taking the lead on starting a concept document and will loop in everyone else as needed.

@mostlyjason
Copy link
Contributor

mostlyjason commented Jun 1, 2021

@mukeshelastic I haven't yet because this was not initially prioritized for 7.14.

@jen-huang I see you added this to the iteration board, and then Eric added a 7.14 label. I saw a discussion with APM that this would improve the UX but Silvia also said its not a blocker for GA via email. What do you think about making this a definition goal first and then planning implementation?

@jen-huang
Copy link
Contributor

@mostlyjason Yes, that's the idea :) Hence the research label too.

@Zacqary Zacqary self-assigned this Jun 2, 2021
@Zacqary
Copy link
Contributor

Zacqary commented Jun 3, 2021

I'm going ahead and prototyping some of the UX in the concept document. Starting with a notification toast on the top of the Integrations page letting the user know which policies need to be upgraded.

@dikshachauhan-qasource
Copy link

Hi @EricDavisX

Whenever merges for are above will be available. We will start testing accordingly.

@Zacqary It could be better if you could share some mock UI design with us whenever feature is ready.

Thanks
QAS

@jen-huang
Copy link
Contributor

@dikshachauhan-qasource Thanks for the diligence! For 7.14 we are doing purely concept exploration and no work will be merged, so this is not something to test for 7.14.

@Zacqary
Copy link
Contributor

Zacqary commented Jun 16, 2021

I've been working on a backend API for upgrading policies, which has a method to first do a dry run of the upgrade. The frontend can use this to make sure it will work without further input, and if it fails, it will return a diff showing where errors occurred.

So far the only case I've run into that will actually cause the upgrade process to throw an error and outright fail is if an input or stream was removed from one version to the next. Adding a new variable works fine, as we can just pull its default from the integration template and apply it.

Are there any other scenarios we should be on the lookout for? If the only situation we can think of that might create a "merge conflict" on upgrade is the removal of a variable then I think that's a fairly easy case to handle.

@jen-huang
Copy link
Contributor

@Zacqary The move to packages exporting granular integrations is the biggest "breaking change" scenario that we have encountered (well, technically we haven't encountered it yet as those package changes aren't released, but they will be soon!). For instructions on how to get these changes, see the testing steps in this PR: #101531

Curious to see what happens when we attempt to upgrade an old AWS policy to the latest one that exports granular integrations (aka multiple policy templates). I would expect that it errors and the user will need to do a manual migration.

@dikshachauhan-qasource
Copy link

Hi @jen-huang Thanks for the feedback and confirmation.

@Zacqary
Copy link
Contributor

Zacqary commented Jun 17, 2021

Yesterday I did some more tests, and it turns out the upgrade process (adapted from the logic of the update API) will not error out if a field has been added in the next version, but it also won't automatically pull the default value from the template. This leads to an error when the user opens the integration in the UI.

We'll need to merge the template defaults with the pre-existing inputs. upgrade will have to be a combination of logic from create and update rather than just a simple wrap of update.

@jen-huang
Copy link
Contributor

Closing this in favor of the fully specc'ed out phase 1 issue: #106048

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design research Team:Fleet Team label for Observability Data Collection Fleet team v7.14.0
Projects
None yet
Development

No branches or pull requests