-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Proposal: Store installed packages in cluster #81110
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
I would clarify the use of the word 'local' in the initial description:
|
@skh Spot on. Can you directly update the issue? |
In addition, I see two ways to store packages in ES:
The zip file would always need to be downloaded and unpacked as a whole. Single files could be queried by file type or path so that we can access single assets more quickly, but there may be many of them in some packages. |
I like the idea around storing each file as a document instead of the zip file. It not only allows us to query as you mentioned but in addition, but allows us to exclude certain files form storing and add metadata to each like hash, date modified / date installed etc. |
👍 to the problem description and proposal. Two things that come to mind are Number of assetsI'm curious about the difference between making 10, 100, etc requests to ES vs moving them from memory. The best case scenario (few assets & fast connection to ES) might not be noticeably affected, but the more assets or greater latency to ES the slower things will feel. One option is to keep the memory cache (changing to an LRU or something less naive than now) and adding values in there on thier way in to ES. That way we keep the durability of ES but still avoid the latency issues. Dealing with binary assets (images)We'll have to base64 encode any binary asset. That adds about 30% to their file size. There's also a CPU cost of decoding them. Again, storing the decoded Here a quick check of the image files sizes. Remember these will be about 30% larger after base64 encoding image files sizes
|
To clarify, I'm saying we would still put assets in ES, but use a cache to store ready-to-serve values to avoid hitting ES and doing any unnecessary work. We could add TTL or any other logic to decide when use or invalidate cache entries. |
Would it be an option to keep the in-memory cache, but purge some files, like ES and Kibana assets from it regularly, while keeping others, like images, for longer? |
Definitely. That's what I was getting at with
We'll have to define the rules and then see if there's an existing package that does what we want out of the box or if we need to wrap one with some code to manage it. Seems like we want both support for both TTL (different by assert class) and max memory size for the cache. https://github.com/isaacs/node-lru-cache is an existing dependency and my go-to, but it doesn't support per-entry TTL. I think we'd have to create multiple caches to get different expiration policies. I did some searching and both https://github.com/node-cache/node-cache & https://github.com/thi-ng/umbrella/tree/develop/packages/cache seem like they'd work for this case |
Before we add a cache, we should first test if we really need it. Having a cache will speed up things but also make things more complicated. Quite a few of the large assets are images and are only used when run through the browser. I assume the browser cache will also help use here to only load it once per user? |
I agree we should profile. The additional work/complexity is low so we can add it later. The browser cache will also need some work (setting headers) but we can look at that when profiling. |
When a package is uninstalled from the system, I'd propose it will be removed from the storage index as well. That way the storage index doesn't silently turn into a secondary installation source that we need to check during package listings and installations. |
I'm not sure at what point during the package installation process we want to update the storage index, but if possible, it seems like it'd be easiest to add it to the storage index only if installation has successfully completed. Then during rollback we can perhaps use the storage index if the previous version is not available in the registry. If we update the storage index as we are installing, this probably won't be possible. |
I just want to highlight that a) we already use a cache b) the proposal specifically mentions.
I don't want to pull us into the weeds re: caching. We can discuss it in the implementation ticket(s). Just highlighting it's not an alteration to the proposal |
Closing since we agreed on the proposal and are discussing further in #83426 |
When the package manager was created, it was created with the idea that the registry and packages are always available. The current implementation uses a local in-memory cache for package contents. Whenever a package is missing from this cache it is re-fetched from the registry. Over the last 2 releases, quite a few issues have shown up where it became a problem that packages are only available from the registry:
To solve all the above problems, I'm proposing to not only cache the packages in memory, but also store them in a dedicated ES index. This also unifies how packages work which are uploaded through zip, coming from the registry or any other model to add packages. Below is an image to have this visualised:
One important detail here is, that for browsing packages from the registry without installing them, these should not be downloaded, see #76261. Packages which are uploaded are always installed.
How exactly packages and assets stored in Elasticsearch should be a follow up discussion if we decide to move forward with this.
Decision: Agreed to move forward. Follow up discussion ticket: #83426
Links
The text was updated successfully, but these errors were encountered: