Rework pacman pkg cache and enable by default #51

lazka · 2020-07-19T13:35:35Z

Three main improvements:

Create a separate cache for each input configuration: Assuming a build matrix
which builds for different arches and has different dependencies we create
a cache for each of them, while loading any of them.
Prune caches before saving them: Every time a package has a new version we
create a new cache which only includes the new packages. This makes sure that
the cache/restore size stays as small as possible over time.
Avoid cache race conditions: If a run doesn't change the cache we wont save it
and if saving fails because another job created it in the mean time,
we catch the error and ignore it.

Overall this cache doesn't save much time, since installation and initial setup
take the most time, but this should save a lot of traffic for our main
repo server.

And also this removes the cache option which was confusing some users because it
only caches packages and not everything.

eine

Overall this cache doesn't save much time, since installation and initial setup
take the most time, but this should save a lot of traffic for our main
repo server.

I believe this was the real motivation to use caching, since the begining. The time reduction is not very meaningful for users, but the external traffic difference is significant.

And also this removes the cache option which was confusing some users because it
only caches packages and not everything.

I am concerned about how to handle this in the release. I wouldn't like to remove option cache now (which would be a breaking change) if we expect to add it again in the future.

Apart from that, if I understand correctly, this caching procedure is similar to the one we have now (only packages are cached, not any installation or home directory). That is, users should find no difference between a clean install or using this automated cache; the resulting packages are the same. What would happen if a job with update: true saves a cache and that is later retrieved by a job with update: false? Is that an unsupported use case?

main.js

lazka · 2020-07-19T20:35:54Z

Overall this cache doesn't save much time, since installation and initial setup
take the most time, but this should save a lot of traffic for our main
repo server.

I believe this was the real motivation to use caching, since the begining. The time reduction is not very meaningful for users, but the external traffic difference is significant.

👍

And also this removes the cache option which was confusing some users because it
only caches packages and not everything.

I am concerned about how to handle this in the release. I wouldn't like to remove option cache now (which would be a breaking change) if we expect to add it again in the future.

I've left the option in the test workflows which just leads to ##[warning]Unexpected input(s) 'cache', valid inputs are ['msystem', 'path-type', 'release', 'update', 'install'] so it's not "breaking" at least. As for reusing it in the future for something different: Not sure, but this doesn't make it worse I guess?

Apart from that, if I understand correctly, this caching procedure is similar to the one we have now (only packages are cached, not any installation or home directory). That is, users should find no difference between a clean install or using this automated cache; the resulting packages are the same.

Exactly.

What would happen if a job with update: true saves a cache and that is later retrieved by a job with update: false? Is that an unsupported use case?

Good point. Assuming "foobar-1.1" gets installed, then cached, then the second job would restore "foobar-1.1", install "foobar-1.0" which gets downloaded and added to the cache and then pruning would remove "foobar-1.0" again because it is older. So in a sense caching would be broken then. This could be worked around by keeping 2 versions of each package around when pruning because there are only two versions reachable, one without update and the newest one, but not sure if that's worth it.

eine · 2020-07-19T20:43:04Z

I've left the option in the test workflows which just leads to ##[warning]Unexpected input(s) 'cache', valid inputs are ['msystem', 'path-type', 'release', 'update', 'install'] so it's not "breaking" at least. As for reusing it in the future for something different: Not sure, but this doesn't make it worse I guess?

If you are unsure about the option being useful in the future, we can remove it, release v2.0.0 today and commit ourselves not to touch caching again in a while (unless some very obvious/drastic change happens in the ecosystem).

What would happen if a job with update: true saves a cache and that is later retrieved by a job with update: false? Is that an unsupported use case?

Good point. Assuming "foobar-1.1" gets installed, then cached, then the second job would restore "foobar-1.1", install "foobar-1.0" which gets downloaded and added to the cache and then pruning would remove "foobar-1.0" again because it is older. So in a sense caching would be broken then. This could be worked around by keeping 2 versions of each package around when pruning.

I would also be ok with removing option update and making it default to true. That would mean that running an extracted tarball/distribution as-is is NOT supported. Being a rolling release, updating is always compulsory. I did not take this decision before because I was unsure about was your (maintainers') position.

lazka · 2020-07-19T21:34:50Z

I've left the option in the test workflows which just leads to ##[warning]Unexpected input(s) 'cache', valid inputs are ['msystem', 'path-type', 'release', 'update', 'install'] so it's not "breaking" at least. As for reusing it in the future for something different: Not sure, but this doesn't make it worse I guess?

If you are unsure about the option being useful in the future, we can remove it, release v2.0.0 today and commit ourselves not to touch caching again in a while (unless some very obvious/drastic change happens in the ecosystem).

I don't see this change breaking any existing uses, but I also don't mind a v2.0.0 release if you'd prefer one (also keeping the previous shell changes in mind)

What would happen if a job with update: true saves a cache and that is later retrieved by a job with update: false? Is that an unsupported use case?

Good point. Assuming "foobar-1.1" gets installed, then cached, then the second job would restore "foobar-1.1", install "foobar-1.0" which gets downloaded and added to the cache and then pruning would remove "foobar-1.0" again because it is older. So in a sense caching would be broken then. This could be worked around by keeping 2 versions of each package around when pruning.

(short interjection, I could also just add "update" to the cache restore fallback key so it doesn't get reused in a non-update job, I didn't think of this..) I'll fix this tomorrow.

I would also be ok with removing option update and making it default to true. That would mean that running an extracted tarball/distribution as-is is NOT supported. Being a rolling release, updating is always compulsory. I did not take this decision before because I was unsure about was your (maintainers') position.

Hmmm, tricky. From the POV of a user I would want stability despite it being rolling, so if the repo is in a good state I'd want no update. If things in the repo happened to be broken when the installer was updated I'd want to update. (also I wouldn't want long updates for the system MSYS2 version). I kinda tend towards what we have now and keep things as is.

eine · 2020-07-20T00:26:56Z

I don't see this change breaking any existing uses, but I also don't mind a v2.0.0 release if you'd prefer one (also keeping the previous shell changes in mind)

Yes, I was thinking that we have several non-breaking but significant changes. But, you are correct that we can also bump to v1.2.0 if we are going to keep update.

(short interjection, I could also just add "update" to the cache restore fallback key so it doesn't get reused in a non-update job, I didn't think of this..) I'll fix this tomorrow.

Nice!

I kinda tend towards what we have now and keep things as is.

Gotcha.

Three main improvements: * Create a separate cache for each input configuration: Assuming a build matrix which builds for different arches and has different dependencies we create a cache for each of them, while loading any of them. * Prune caches before saving them: Every time a package has a new version we create a new cache which only includes the new packages. This makes sure that the cache/restore size stays as small as possible over time. * Avoid cache race conditions: If a run doesn't change the cache we wont save it and if saving fails because another job created it in the mean time, we catch the error and ignore it. Overall this cache doesn't save much time, since installation and initial setup take the most time, but this should save a lot of traffic for our main repo server. And also this removes the cache option which was confusing some users because it only caches packages and not everything.

lazka · 2020-07-20T07:32:16Z

we are going to keep update.

(short interjection, I could also just add "update" to the cache restore fallback key so it doesn't get reused in a non-update job, I didn't think of this..) I'll fix this tomorrow.

Nice!

Added this now.

In theory we could now run all CI jobs in parallel.

edit: added that as a second commit, feedback welcome

Start with one which installs some packages so we have some cache reuse later on. Then start all the rest.

eine · 2020-07-20T11:12:51Z

Execution time of the workflow was reduced from 20min to 7min. That's nice! I hope we won't have any conflicts unless GHA is used intensively in other repos too.

lazka mentioned this pull request Jul 19, 2020

Provide a way to cache the installation #50

Closed

lazka force-pushed the cache-always branch from 7861840 to cc801fa Compare July 19, 2020 16:10

eine reviewed Jul 19, 2020

View reviewed changes

main.js Outdated Show resolved Hide resolved

lazka force-pushed the cache-always branch from cc801fa to dbe8cd1 Compare July 20, 2020 07:05

lazka force-pushed the cache-always branch from dbe8cd1 to 37fa47b Compare July 20, 2020 07:28

workflows: run more jobs in parallel

c66660b

Start with one which installs some packages so we have some cache reuse later on. Then start all the rest.

eine merged commit 51ce8cc into master Jul 20, 2020

eine deleted the cache-always branch July 20, 2020 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework pacman pkg cache and enable by default #51

Rework pacman pkg cache and enable by default #51

lazka commented Jul 19, 2020

eine left a comment

lazka commented Jul 19, 2020 •

edited

Loading

eine commented Jul 19, 2020

lazka commented Jul 19, 2020 •

edited

Loading

eine commented Jul 20, 2020

lazka commented Jul 20, 2020 •

edited

Loading

eine commented Jul 20, 2020

Rework pacman pkg cache and enable by default #51

Rework pacman pkg cache and enable by default #51

Conversation

lazka commented Jul 19, 2020

eine left a comment

Choose a reason for hiding this comment

lazka commented Jul 19, 2020 • edited Loading

eine commented Jul 19, 2020

lazka commented Jul 19, 2020 • edited Loading

eine commented Jul 20, 2020

lazka commented Jul 20, 2020 • edited Loading

eine commented Jul 20, 2020

lazka commented Jul 19, 2020 •

edited

Loading

lazka commented Jul 19, 2020 •

edited

Loading

lazka commented Jul 20, 2020 •

edited

Loading