Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pkg): Enable cache on fetch actions #10850

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

Leonidas-from-XIV
Copy link
Collaborator

This enables caching of downloaded files as part of the fetch action, if the file to be retrieved is retrieved via a cryptographic hash (and thus presumably the exact same as the first time it was downloaded).

@rgrinberg I've added a new function to the engine to force-set the cache flag, as add_can_go_in_shared_cache can only be flipped to false but never back to true. It's not used at the moment, so another option would be to remove that && but just in case I made it a separate function.

| Some (_, checksum) -> `Checksum checksum
| None -> `Url url
| Some (_, checksum) -> `Checksum checksum, Some true
| None -> `Url url, None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we rely on the default in this case? Either this is useful to cache or it's not.

I would say it's probably useful to cache since this step will de-duplicate the files being stored among all the workspaces. If you think this will consume too much space, then just set it to false.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about caching it via just the URL because there's nothing else preventing the URL from changing, so caching a source that keeps changing seems potentially incorrect. I wanted to let people override that with DUNE_CACHE_RULES=enabled which enables all these potentially unsafe behaviors instead of introducing another config option.

For package management it doesn't matter since just about all packages have checksums.

If you think this is alright to cache in all cases, I will gladly switch it to always true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about caching it via just the URL because there's nothing else preventing the URL from changing, so caching a source that keeps changing seems potentially incorrect

Yeah, that's an issue. Why is this related to the shared cache though? Is it because you want dune clean to re-download?

I wanted to let people override that with DUNE_CACHE_RULES=enabled which enables all these potentially unsafe behaviors instead of introducing another config option.

The DUNE_CACHE_RULES is mostly there for backwards compat. I don't think we should rely on it for future rules. When rules are useful and safe to cache we should do it, otherwise, we disable it.

If you think this is alright to cache in all cases, I will gladly switch it to always true.

As mentioned earlier, I think it has the potential to save users a bunch of space if they are expanding the same source that comes from a git revision in multiple workspaces.

I think using either true or false is fine, but I don't like letting the user fiddle with DUNE_CACHE_RULES.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's an issue. Why is this related to the shared cache though?

My understanding is that if the URL changes upstream we would continue to use the old data from the shared cache with no easy way to even invalidate that particular URL. This happens quite a lot when people depend on Github tarballs which get regenerated in in certain cases.

The DUNE_CACHE_RULES is mostly there for backwards compat. I don't think we should rely on it for future rules. When rules are useful and safe to cache we should do it, otherwise, we disable it.

I see, yes. That makes sense.

This enables caching of downloaded files as part of the fetch action, if
the file to be retrieved is retrieved via a cryptographic hash (and thus
presumably the exact same as the first time it was downloaded).

Signed-off-by: Marek Kubica <marek@tarides.com>
@Leonidas-from-XIV Leonidas-from-XIV merged commit 4472318 into ocaml:main Aug 28, 2024
27 of 28 checks passed
@Leonidas-from-XIV Leonidas-from-XIV deleted the cache-fetch-action branch August 28, 2024 15:31
anmonteiro pushed a commit to anmonteiro/dune that referenced this pull request Nov 17, 2024
This enables caching of downloaded files as part of the fetch action, if
the file to be retrieved is retrieved via a cryptographic hash (and thus
presumably the exact same as the first time it was downloaded).

Signed-off-by: Marek Kubica <marek@tarides.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants