-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Implement new transform installation mechanism #134321
Comments
Pinging @elastic/fleet (Team:Fleet) |
@kevinlog Please provide any feedback on the requirements around the Endpoint package. Will we need to be able to install older versions of the package that define transforms in the legacy way that was not implemented in the spec or can we get away with only supporting the new format? |
I believe the ability to install legacy transforms is still required. There are package rollbacks and other behaviors that could still task a relatively new (8.3,8.4) kibana and fleet plugin with installing an older endpoint package (rather than just a stack upgrade scenario with a pre-existing old package). There are a few edge cases of the actual installation of an older package still needed. |
Makes sense, but I want to be sure we don't need to support this for a long period of time. Would it be feasible to drop support for the legacy transforms in the following minor release? So if we were to implement this in 8.4 (for example) could we remove support for legacy in 8.5? |
I'm sure on some timescale we can drop it, but I think it might need a few release cycles. Package upgrading still remains somewhat detached from stack upgrades, which are historically fraught with user issues and considered "difficult". I want to avoid contributing to that reputation for as long as reasonable. |
Pinging @elastic/ml-ui (:ml) |
I hear this is slated to be done in 8.5, is that correct? |
@szeitlin Looks like this has switched hands a couple of times so would be good to clarify ownership and timeline. @peteharverson @qn895 Is the ML UI team committing to implement this for 8.5? |
We do not have plans to work directly on the endpoint package. From the ML perspective, we want to build up knowledge and define best practices for how transforms should be included in a Fleet package, and to assess the feature gaps. During 8.5 timeframes we will be working on a spike to install and upgrade a dummy package, making sure the end to end experience makes sense for generic transforms in Fleet, and to define how we can efficiently develop and test this. We see this as a pre-req to creating the host risk score package and to advance the investigations for managing the privileges that transforms are deployed to run with. |
We have a ticket to make Endpoint package changes in association with the new transform spec here: https://github.com/elastic/security-team/issues/4314 @pzl investigated what was required to make changes the the Endpoint package in |
Thanks for calling this out @jen-huang! FWIW I don't know what's in the Endpoint package, or exactly how it relates to what we're doing with Host Risk Score. Our current plan is to create and deploy packages as integrations. I'm not sure if we're going to run into blockers there, as I've heard mixed things about whether the privilege issue has to be addressed before we can release e.g. Host Risk Score as a package. |
Hi,
I just wanted to highlight in this thread too that currently, the Endpoint package violates the package-spec. We temporarily acknowledge the bad package format, but we will stop doing it soon. When we enforce validation, it won't be possible to publish the package (hard blocker). Kindly please to prioritize this issue accordingly. |
Coordinate a release@mtojek thanks for the heads up. It sounds like we need to agree on a release to coordinate:
@szeitlin @mtojek - is an I'm happy to set up a meeting if we feel we need to quickly get in alignment. Even if we're comfortable with Host Risk Score@szeitlin to answer your question below
It doesn't relate to your changes directly. Currently, the transforms in the Endpoint package are installed by Fleet using a legacy format that isn't in compliance with the Package spec for transforms in packages. In order for Fleet to support the new spec, we need to update the way transforms are installed. Then, we can update the Endpoint package to be in compliance with the new spec and it will still be installed correctly by Fleet. Host Risk Score is only dependent on this ticket, to update how Fleet installs transforms from packages. |
I wouldn't pay much attention to particular timeframes, but ...
... I wouldn't like this topic to die accidentally. It isn't a blocker for my team in terms of v2 migration, but it doesn't allow to enable enforced format validation and makes the endpoint-package error-prone. |
Ok, if I'm understanding this correctly (and thanks @sophiec20 and @mtojek and @kevinlog for explaining), we aren't blocked from packaging Host Risk Score as an integration but it would not be possible for customers on Elastic Cloud to actually install and use Host Risk Score as a module until Fleet is updated to support the new package spec for transforms? (assuming I'm right in thinking Fleet is the internal shorthand for what customers refer to as Elastic Cloud?) Perhaps I'm misunderstanding, but it doesn't sound like this is a ton of work, since Fleet support for a package spec including transforms already exists, it's mostly a matter of making sure someone has the bandwidth to actually do it, and it's actually prioritized? Or are there technical details that still need to be decided about how to implement the required changes? |
This isn't quite right. Fleet is the part of the Kibana app that allows users to deploy and manage Agents. Users can also add different integrations to Agents, such as Endpoint Security. Part of adding this integration is installing the associated integration package, in Endpoint Security's case this is the Endpoint package. This would be just like installing the proposed Host Risk Score integration. The packages are collections ES components needed to store data shipped by that integration, for example, index templates and transforms. Fleet has logic to install these transforms, but only based on a legacy format that isn't in compliance with the recently merged transform spec. This ticket would update the way Fleet installs transforms to be in compliance with that spec. When that is done, my team needs to update the Endpoint package in parallel to conform to the spec. My understanding for Host Risk Score is that you will add an Integration Package to package storage which will contain a transform definition in compliance with the new transform specification. The change described in this ticket would be required to properly install your package.
I can't speak to the effort required, but we do need to properly roadmap this ticket to unblock:
At this point, I do think it's important we discuss in a meeting to make sure expectations are set correctly and we can properly roadmap this work. Our main blocker at this point is resourcing and prioritization. I'll work to set something up for early next week. I'll include representatives from OLM/Endpoint, Host Risk Score, and Fleet. Feel free to ping me if you're interested. |
It shouldn’t be strictly necessary for Endpoint to update their package at the same time since we were planning to support both the legacy and new formats simultaneously so that older versions of Endpoint could still be installed. This should allow the changes in this ticket to be decoupled from Endpoint updating their package. I believe the reason to update the Endpoint package is more driven by the Package Storage v2 validation step, which I admit I hadn’t considered. That does make this ticket a dependency for package storage v2. +1 to meeting in person, thanks for setting that up @kevinlog |
Hey all, I've updated this issue with the full scope required for this initial step. Please review in detail so we can determine resourcing for this issue. |
We need to consider carefully what sort of updates to transforms will be supported, and how clever we want to try to make Fleet. Rather than support every possible upgrade scenario in the most optimal way it might be safer (as in reduced test surface area so reduced risk of bugs) to just have one mechanism for upgrading transforms that is sometimes overkill but always works eventually. For example, I can think of three scenarios for upgrading a transform:
However, trying to support all 3 scenarios optimally would be very complex. Which scenario applies depends not only on the version you upgrade to, but also the version you upgrade from. For this reason I think it's best that all upgrades that change a transform in any way are assumed to require deletion and recreation of both the transform and its destination index. (Obviously if the transform is exactly the same in both the old and new packages then it doesn't need to be touched.) Consumers of the package are going to need to be aware of this and compensate for it. When the transform is deleted and recreated it will take a while to process historical data before catching up to real-time. During this period the transform destination index will be incomplete, so consumers of the transform should surface this to end users in some way (for example via a UI indicator). It's possible to tell whether the upgraded transform has processed historical data by looking at the checkpoint number in the transform stats. |
Another option that would avoid the need to delete destination indices is as follows:
It may be desirable to go with a strategy like this in preference to the simpler suggestion in #134321 (comment) because the raw data needed to rebuild a destination index from scratch may no longer exist at the time of a package upgrade. However, whatever we decide we should aim to have one way that transforms get upgraded within Fleet. Otherwise the complexity of trying to have multiple ways to upgrade based on what is being changed will inevitably cause bugs. |
I'm following up in this ticket after the meeting we had today to discuss the roadmap and team assignment for this ticket. The high level conclusion:
Let me know if the above sounds correct to everyone, happy to make any adjustments. I'm sending out the meeting recording and notes in separate email, just let me know if you'd like me to add you. |
This ticket is already under construction here: https://github.com/elastic/security-ml/issues/83 |
Hey all, now that #140046 is merged, I wanted to circle back on this conversation. In the Endpoint package, we have a PR to update our transform schema. My question for the team is, are we comfortable putting a package into production that uses the new transform schema in the I tested the merged PR against a few different scenarios with both the legacy Endpoint package and one using the new schema. Details of my tests are below (context in a comment from the PR). I tested the following scenarios and they work as expected:
Upgrade scenarios:
The above cases are the critical paths for installing new Endpoint package and updating older ones. With this, the Endpoint package using the new transform schema should be supportable in Fleet now. I would be OK with merging the package as the installation works and it gets us closer to elastic package schema compliance. We could also hold our package PR and tag it for an Happy to hear thoughts from others. cc @joshdover @qn895 @pzl @sophiec20 @droberts195 @mtojek @szeitlin |
@kevinlog I talked this through with @qn895 and @peteharverson. Originally the idea was that upgrade was out of scope for the first iteration of the new transform installation mechanism. What's happening when you are testing upgrades is that the transform and all its associated objects (index template, destination index, etc) get completely deleted, then recreated from scratch. One thing to be aware of with this approach is that on a huge dataset rebuilding the destination index from scratch could take a while - assume several hours for your most important customer. Any downstream component that relies on the contents of the destination index will have to be written to gracefully handle this period when the original destination index has been deleted and a new one is being populated from scratch. In the description of elastic/endpoint-package#270 it says the PR is "mostly renaming files and translating from json to yaml". Does it make any functional changes at all? Or is it entirely renaming files and translating from json to yaml? If there are no functional changes then it may be safer to hold it for 8.6. That would allow time for testing the upgrade on the biggest data set you have access to, and seeing the effect of deleting everything and restarting from scratch. It might also be that we can add more upgrade functionality for 8.6, like creating a new transform in parallel, having an alias on the destination index and only switching the alias to the new destination index once the transform has populated it from existing data. But on the other hand, if you have also made functional changes as well as changing the installation mechanism then the need to get your change into 8.5 might be greater. |
My apologies for the confusion: Double-checking the Fleet code again, and we currently don’t delete the destination indices upon uninstallation or upgrade. It will only delete the transforms, index templates, and component templates from the old version:
A simplistic upgrade works now (i.e. Install Endpoint package with compliant transforms -> upgrade to an Endpoint package with compliant transforms). But it's important to note as of #140046, we currently don’t gracefully handle scenarios like if after the upgrade and the index templates or the mappings change, how would that apply to the destination indices that have already been created. Or, if the transforms change functionally (e.g. different pivot or latest configuration) but still use the same destination index name. That would need more robust testing and discussion in 8.6. |
@droberts195 @qn895 thank you for the insight.
Understood. For what it's worth, I don't believe the existing logic for installation/upgrade of the legacy transform schema was any more graceful or efficient and we've relied on it for about 2 years. However, our use case is likely an outlier as we use the
No, there are no functional changes in the endpoint package PR that are not already merged in the package master branch. This PR is specifically meant to prepare us for the switch to the new compliant transform schema. It is it not critical for us to merge this change to support our It sounds like we'd be more comfortable waiting for Regarding cc @pzl |
IMO there's no specific rush to transition the Endpoint package to the new mechanism, though there is some tech debt to clean up in the package registry pipelines to remove the exceptions for Endpoint's spec violations. But I also want to clarify that we don't currently have any short-term plans to change the upgrade behavior. If we should revisit this, let's discuss. I think we may need to fix the behavior mentioned in #134321 (comment) to at least rebuild the transform destination index when required. Let's continue the discussion in the overall thread here: elastic/package-spec#370 (comment) @qn895 Is there anything remaining to close out this issue? |
@joshdover, we have some more work to do in 8.6 around upgrades in terms of handling changes to transform destination index, although a lot of it will be best practices and working out how to enforce those best practices. I agree on closing out on this issue to continue discussion in the other thread 👍 |
Official support for transforms was added to the package-spec in elastic/package-spec#307. We now need to update Fleet to support installing transforms defined in the official format.
Scope
The spec defines transforms to be contained in a package with the following file structure:
package-version/elasticsearch/transform/<name>/transform.yml
- required, contains the transform definition itselfpackage-version/elasticsearch/transform/<name>/manifest.yml
- optional, contains index template settings and whether or not the transform should be started on installationpackage-version/elasticsearch/transform/<name>/fields.yml
- optional, contains index mappings using the same format as data streams mapping defintionsYou can find some examples of valid transforms that match the new spec here: https://github.com/elastic/package-spec/tree/main/test/packages/good/elasticsearch/transform
The transform installation code needs to be updated to:
fields.yml
file is present ormanifest.yml
containsdestination_index_template
:fields.yml
file. This can be done by leveragingloadFieldsFromYaml
andgenerateMappings
manifest.yml
into a single index templateEsAssetReferences
transform.yml
file rather than a json file.manifest.yml
file containsstart: false
(the default is to start the transform if this isn't specified).The transform uninstallation code should not need to be updated. Our existing code for deleting transforms and index templates should already work as long as the
EsAssetReferences
are updated correctly to track the assets that were created by the install code.Things to ensure still work:
Out of scope
kibana_system
index privileges #137278Blocks:
The text was updated successfully, but these errors were encountered: