Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache installed packages for future runs? #23

Closed
daviwil opened this issue May 28, 2020 · 13 comments
Closed

Cache installed packages for future runs? #23

daviwil opened this issue May 28, 2020 · 13 comments
Labels
enhancement New feature or request

Comments

@daviwil
Copy link

daviwil commented May 28, 2020

Hi @eine! Do you know if there might be any way this action could cache packages that are installed with install so that they don't need to be re-installed each time a workflow runs? My setup-msys2 runs often take about 7 minutes to update and then install packages so it'd be nice to skip that. Looks like @actions/tool-cache has the ability to cache directories:

https://github.com/actions/toolkit/tree/master/packages/tool-cache#cache

@eine
Copy link
Collaborator

eine commented May 29, 2020

Hi @daviwil! Yes, my runs take up to 10 min too...

Actually, tool-cache is already used in this action. Hence, using the methods to save and retrieve the cache should be straightforward. The issue is the logic to decide when to save a new cache and when to load it instead of a "clean" install. Currently, there are two main workflows:

  • update: true uses the default installation in C:\msys64. This is updated in the base image automatically every 10 days or so. Hence, it would be desirable to cache it during those days but detect when the base image is updated, to regenerate the cache.
  • update: false retrieves a tarball and it extracts it. This can be easier to handle, because the tarball is expected to change once a year only.

Another alternative is to download the packages that are required in install and cache them. Then, they would be installed but not retrieved. However, I fear it might not be straighforward to handle it unless pacman provides some feature to do so.


For now, I'm thinking about adding option cache which accepts:

  • false (default): no caching.
  • true: automatically generate a "key" for the cache and use it for loading and saving.
  • clean: install/update a clean environment, and then save it.
  • load: load the environment from the cache, but do not update the cache.
  • key: as true, but use some specific key.
  • clean-key: as clean, but use some specific key.
  • load-key: as load, but use some specific key.

Allowing specific keys is meant for users to share the same cache for multiple jobs/tasks in the workflow, or to use specific caches for each of them.

What do you think?

@eine eine added the enhancement New feature or request label May 29, 2020
@lazka
Copy link
Member

lazka commented May 29, 2020

pacman caches downloads in /var/cache/pacman/pkg/. In theory this could be shared between jobs without any keys. (Not sure how that fits tool-cache though..)

@eine
Copy link
Collaborator

eine commented Jun 3, 2020

@lazka, thanks for the hint! Unfortunately, after some testing, it seems that GitHub Actions do NOT support chaching content between workflow/job runs. Caching works in the same job only, or on self-hosted runners. See https://github.com/actions/toolkit/tree/master/packages/tool-cache#cache and this clarification which was not merged: https://github.com/actions/toolkit/pull/358/files

@daviwil it seems not possible to do caching in GitHub Actions unless users provide credentials/urls of some external storage service... 😞

@lazka
Copy link
Member

lazka commented Jun 3, 2020

I see. Can we document how the users can do it themselves maybe? with actions/cache@v2 etc

Not sure if it will help much anyway though

@eine
Copy link
Collaborator

eine commented Jun 4, 2020

@lazka, thanks for the reference. I tried it as an Action and also as an npm dependency. It seems that cache keys cannot be updated, so I need to guess the easiest setup which works without requiring users to handle keys themselves. See actions/cache#342.

@eine
Copy link
Collaborator

eine commented Jun 4, 2020

So, generating a hash of /var/cache/pacman/pkg/ and using it as the key does work. However, as @lazka foresaw, it seems not to help much...

Currently, the default installation in windows-latest is outdated and 5.5 GB of packages need to be installed, 1 GB of those need to be downloaded. Using cache, that 1 GB needs 60-80s to restore. Still, ~8-9 min are required for updating the packges: https://github.com/eine/setup-msys2/runs/738513432?check_suite_focus=true#step:4:79

@lazka, should we try to cache the whole install directory (C:\msys64)?

EDIT

Disabling CheckSpace reduced update time ~40s: https://github.com/eine/setup-msys2/runs/738684884?check_suite_focus=true#step:4:12

@lazka
Copy link
Member

lazka commented Jun 5, 2020

Uh, so there are both i686/x86_64 toolchains and some more installed by default? I'd just go with not using the image installation then. Our CI takes 45secs to download/install/update right now.

Disabling CheckSpace reduced update time ~40s: https://github.com/eine/setup-msys2/runs/738684884?check_suite_focus=true#step:4:12

Sounds like a good idea then 👍

@eine
Copy link
Collaborator

eine commented Jun 5, 2020

Uh, so there are both i686/x86_64 toolchains and some more installed by default?

Oh, I spent months trying to explain @MSP-Greg why it might be NOT a good idea to install by default all the packages that a single user seemed to consider of most importance (see quotes below). However, it had absolutely no effect; and he wrote the PRs (https://github.com/actions/virtual-environments/pulls?q=is%3Apr+sort%3Aupdated-desc+author%3AMSP-Greg+is%3Aclosed) with little supervision. Maybe if you tell in msys2/msys2-installer#5, he/she will listen to you.

I'd just go with not using the image installation then. Our CI takes 45secs to download/install/update right now.

I really had the hope GitHub employees would care about the UX of the features they introduce, but quality measurements seem to be relaxed for GitHub Actions yet. Hence, I think I'll do as you say, and I'll undo using the image installation.

Disabling CheckSpace reduced update time ~40s: eine/setup-msys2/runs/738684884?check_suite_focus=true#step:4:12

Sounds like a good idea then 👍

I copied the idea from some other script in MSYS2 repos... 😄


actions/runner-images#30 (comment)
Overall, as long as a base MSYS2 is provided, all other features (updating, installing (cached) packages, etc.) can be managed by an Action such as setup-msys2. That's why I think we should focus on the features that are not easily achievable with an action right now.

Anyway, once again, this should not be about what each of us in this thread wants. None of us has any data/statistics to justify the inclusion of any specific package.

actions/runner-images#30 (comment)
You have repeatedly asked for a certain list of packages based on your specific use case. It feels that you are ok with installing any packages beyond what you need, but not less than that. That's what I think that is fundamentally mistaken and what I've been to trying to argue.

actions/runner-images#342 (comment)
I beg to differ. Updating by default prevents users from testing their tools on a stable environment.

actions/runner-images#342 (comment)
Honestly, I suggest you to get involved in the development of setup-msys2. Many of your ideas are useful for many users indeed, and I'd be really happy to have those supported by a single action, instead of fragmenting the ecosystem. Nevertheless, I believe that most of them need to be optional; thus, should not be applied to this PR.
Please, note that, regarding setup time (which you seem to be more concerned about), setup-msys2 can be combined with a caching approach, so that you have your custom MSYS2 install ready-to-use.

actions/runner-images#342 (comment)
I understand that Ruby-specific details might not fit in setup-msys2, so it would still make sense for you to keep your own action. Nonetheless, features such as having gcc and other build-essentials set up can be useful for many users. Hence, setup-msys2 might support some "preset" groups which are optionally cached separatedly.

@MSP-Greg
Copy link

MSP-Greg commented Jun 5, 2020

@lazka

so there are both i686/x86_64 toolchains and some more installed by default?

Yes, not as many packages as AppVeyor; the emphasis was on compile tools. Generally, I only update what I need...

@eine
Copy link
Collaborator

eine commented Jun 5, 2020

By NOT using the default installation, update time was reduced from 7-12 min to 1-3 min:

I'm closing this issue, because cache is supported in v1.1.0. See https://github.com/eine/setup-msys2/blob/master/CHANGELOG.md#v110 and https://github.com/eine/setup-msys2#cache.

@daviwil, using update: true will now use the custom installation. If you want to keep using the default installation in the environment (with the known caveats), set release: false too.

@eine eine closed this as completed Jun 5, 2020
@daviwil
Copy link
Author

daviwil commented Jun 5, 2020

@eine Thanks so much for investigating this so thoroughly! I'm going to switch over to the new release and use the cache open. There's definitely no good reason to use the default installation of MSYS2 if it causes such long runtimes.

@eine
Copy link
Collaborator

eine commented Jun 5, 2020

@daviwil, it was an interesting issue, indeed.

If you use eine/setup-msys2@v1, you should already be getting the latest release (v1.1.0). The purpose of tag v1 is to have some kind of automatic updates. Although unidiomatic for git, it is the suggested procedure until semver is properly supported in GitHub Actions. Of course, you might have good reasons to use an specific branch, such as eine/setup-msys@v1.1.0.

NOTE: strictly, the change from v1.0.1 to v1.1.0 includes a breaking change. But I have considered the advantages to be worth it, and I don't feel like bumping to v2 because of this.

@daviwil
Copy link
Author

daviwil commented Jun 6, 2020

I was still sitting on v0 so I wasn't affected by the change. Just updated to v1, with the new release behavior alone you've shaved about 7 minutes off of my CI runs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants