Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework pacman pkg cache and enable by default #51

Merged
merged 2 commits into from
Jul 20, 2020
Merged

Rework pacman pkg cache and enable by default #51

merged 2 commits into from
Jul 20, 2020

Conversation

lazka
Copy link
Member

@lazka lazka commented Jul 19, 2020

Three main improvements:

  • Create a separate cache for each input configuration: Assuming a build matrix
    which builds for different arches and has different dependencies we create
    a cache for each of them, while loading any of them.
  • Prune caches before saving them: Every time a package has a new version we
    create a new cache which only includes the new packages. This makes sure that
    the cache/restore size stays as small as possible over time.
  • Avoid cache race conditions: If a run doesn't change the cache we wont save it
    and if saving fails because another job created it in the mean time,
    we catch the error and ignore it.

Overall this cache doesn't save much time, since installation and initial setup
take the most time, but this should save a lot of traffic for our main
repo server.

And also this removes the cache option which was confusing some users because it
only caches packages and not everything.

Copy link
Collaborator

@eine eine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this cache doesn't save much time, since installation and initial setup
take the most time, but this should save a lot of traffic for our main
repo server.

I believe this was the real motivation to use caching, since the begining. The time reduction is not very meaningful for users, but the external traffic difference is significant.

And also this removes the cache option which was confusing some users because it
only caches packages and not everything.

I am concerned about how to handle this in the release. I wouldn't like to remove option cache now (which would be a breaking change) if we expect to add it again in the future.

Apart from that, if I understand correctly, this caching procedure is similar to the one we have now (only packages are cached, not any installation or home directory). That is, users should find no difference between a clean install or using this automated cache; the resulting packages are the same. What would happen if a job with update: true saves a cache and that is later retrieved by a job with update: false? Is that an unsupported use case?

main.js Outdated Show resolved Hide resolved
@lazka
Copy link
Member Author

lazka commented Jul 19, 2020

Overall this cache doesn't save much time, since installation and initial setup
take the most time, but this should save a lot of traffic for our main
repo server.

I believe this was the real motivation to use caching, since the begining. The time reduction is not very meaningful for users, but the external traffic difference is significant.

👍

And also this removes the cache option which was confusing some users because it
only caches packages and not everything.

I am concerned about how to handle this in the release. I wouldn't like to remove option cache now (which would be a breaking change) if we expect to add it again in the future.

I've left the option in the test workflows which just leads to ##[warning]Unexpected input(s) 'cache', valid inputs are ['msystem', 'path-type', 'release', 'update', 'install'] so it's not "breaking" at least. As for reusing it in the future for something different: Not sure, but this doesn't make it worse I guess?

Apart from that, if I understand correctly, this caching procedure is similar to the one we have now (only packages are cached, not any installation or home directory). That is, users should find no difference between a clean install or using this automated cache; the resulting packages are the same.

Exactly.

What would happen if a job with update: true saves a cache and that is later retrieved by a job with update: false? Is that an unsupported use case?

Good point. Assuming "foobar-1.1" gets installed, then cached, then the second job would restore "foobar-1.1", install "foobar-1.0" which gets downloaded and added to the cache and then pruning would remove "foobar-1.0" again because it is older. So in a sense caching would be broken then. This could be worked around by keeping 2 versions of each package around when pruning because there are only two versions reachable, one without update and the newest one, but not sure if that's worth it.

@eine
Copy link
Collaborator

eine commented Jul 19, 2020

I've left the option in the test workflows which just leads to ##[warning]Unexpected input(s) 'cache', valid inputs are ['msystem', 'path-type', 'release', 'update', 'install'] so it's not "breaking" at least. As for reusing it in the future for something different: Not sure, but this doesn't make it worse I guess?

If you are unsure about the option being useful in the future, we can remove it, release v2.0.0 today and commit ourselves not to touch caching again in a while (unless some very obvious/drastic change happens in the ecosystem).

What would happen if a job with update: true saves a cache and that is later retrieved by a job with update: false? Is that an unsupported use case?

Good point. Assuming "foobar-1.1" gets installed, then cached, then the second job would restore "foobar-1.1", install "foobar-1.0" which gets downloaded and added to the cache and then pruning would remove "foobar-1.0" again because it is older. So in a sense caching would be broken then. This could be worked around by keeping 2 versions of each package around when pruning.

I would also be ok with removing option update and making it default to true. That would mean that running an extracted tarball/distribution as-is is NOT supported. Being a rolling release, updating is always compulsory. I did not take this decision before because I was unsure about was your (maintainers') position.

@lazka
Copy link
Member Author

lazka commented Jul 19, 2020

I've left the option in the test workflows which just leads to ##[warning]Unexpected input(s) 'cache', valid inputs are ['msystem', 'path-type', 'release', 'update', 'install'] so it's not "breaking" at least. As for reusing it in the future for something different: Not sure, but this doesn't make it worse I guess?

If you are unsure about the option being useful in the future, we can remove it, release v2.0.0 today and commit ourselves not to touch caching again in a while (unless some very obvious/drastic change happens in the ecosystem).

I don't see this change breaking any existing uses, but I also don't mind a v2.0.0 release if you'd prefer one (also keeping the previous shell changes in mind)

What would happen if a job with update: true saves a cache and that is later retrieved by a job with update: false? Is that an unsupported use case?

Good point. Assuming "foobar-1.1" gets installed, then cached, then the second job would restore "foobar-1.1", install "foobar-1.0" which gets downloaded and added to the cache and then pruning would remove "foobar-1.0" again because it is older. So in a sense caching would be broken then. This could be worked around by keeping 2 versions of each package around when pruning.

(short interjection, I could also just add "update" to the cache restore fallback key so it doesn't get reused in a non-update job, I didn't think of this..) I'll fix this tomorrow.

I would also be ok with removing option update and making it default to true. That would mean that running an extracted tarball/distribution as-is is NOT supported. Being a rolling release, updating is always compulsory. I did not take this decision before because I was unsure about was your (maintainers') position.

Hmmm, tricky. From the POV of a user I would want stability despite it being rolling, so if the repo is in a good state I'd want no update. If things in the repo happened to be broken when the installer was updated I'd want to update. (also I wouldn't want long updates for the system MSYS2 version). I kinda tend towards what we have now and keep things as is.

@eine
Copy link
Collaborator

eine commented Jul 20, 2020

I don't see this change breaking any existing uses, but I also don't mind a v2.0.0 release if you'd prefer one (also keeping the previous shell changes in mind)

Yes, I was thinking that we have several non-breaking but significant changes. But, you are correct that we can also bump to v1.2.0 if we are going to keep update.

(short interjection, I could also just add "update" to the cache restore fallback key so it doesn't get reused in a non-update job, I didn't think of this..) I'll fix this tomorrow.

Nice!

I kinda tend towards what we have now and keep things as is.

Gotcha.

Three main improvements:

* Create a separate cache for each input configuration: Assuming a build matrix
  which builds for different arches and has different dependencies we create
  a cache for each of them, while loading any of them.
* Prune caches before saving them: Every time a package has a new version we
  create a new cache which only includes the new packages. This makes sure that
  the cache/restore size stays as small as possible over time.
* Avoid cache race conditions: If a run doesn't change the cache we wont save it
  and if saving fails because another job created it in the mean time,
  we catch the error and ignore it.

Overall this cache doesn't save much time, since installation and initial setup
take the most time, but this should save a lot of traffic for our main
repo server.

And also this removes the cache option which was confusing some users because it
only caches packages and not everything.
@lazka
Copy link
Member Author

lazka commented Jul 20, 2020

we are going to keep update.

(short interjection, I could also just add "update" to the cache restore fallback key so it doesn't get reused in a non-update job, I didn't think of this..) I'll fix this tomorrow.

Nice!

Added this now.

In theory we could now run all CI jobs in parallel.

edit: added that as a second commit, feedback welcome

Start with one which installs some packages so we have some cache reuse later on.
Then start all the rest.
@eine eine merged commit 51ce8cc into master Jul 20, 2020
@eine eine deleted the cache-always branch July 20, 2020 11:02
@eine
Copy link
Collaborator

eine commented Jul 20, 2020

Execution time of the workflow was reduced from 20min to 7min. That's nice! I hope we won't have any conflicts unless GHA is used intensively in other repos too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants