Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(pkg): Automatic fetching of opam-repository #8105

Merged
merged 1 commit into from
Jul 16, 2023

Conversation

Leonidas-from-XIV
Copy link
Collaborator

This PR is an exploration on downloading a copy of opam-repository for locking. It is surprisingly easy to implement and it works rather fast. Currently the Git repo that it downloads is 142M but an initial lock of dune takes 11s and a consecutive lock with the repo downloaded takes about 3s.

I'm honestly rather pleasantly surprised how nicely this works.

Copy link
Collaborator

@gridbugs gridbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! It will be nice to get rid of the opam switch logic. That will really simplify the solver. You're welcome to remove the opam switch stuff from src/dune_pkg/opam_solver.ml or one of us can do that in a later change.

bin/pkg.ml Outdated Show resolved Hide resolved
@rgrinberg
Copy link
Member

For opam-repository, why do we download the git repo rather than the raw tarball? I understand using git repos for actual package sources, but here it seems like we'll always be downloading much more than we'll ever need.

@Leonidas-from-XIV
Copy link
Collaborator Author

I don't have any strong opinion on it but my idea of downloading the git repo is because that makes it easier to update e.g. the next time lock is called, as only the differences between the last and current opam-repo need to be fetched.

@rgrinberg
Copy link
Member

That's probably worth having experimenting, but I'm not entirely sure if this is optimal for opam-repository. The issue is that we're going to fetch a bunch of intermediate git objects that will be of little use. I would imagine something rsync like on top of the http protocol would end up being much faster.

@Alizter
Copy link
Collaborator

Alizter commented Jul 5, 2023

Is there ever a reason why we would want historical versions of an opam repository? If not, then it doesn't make much sense to use git IMO.

@rgrinberg
Copy link
Member

There's a bit more discussion to be had here, but it's quite possible that we'll have the equivalent of opam update on a per project basis. In which case, we will need to store more than one version.

@Leonidas-from-XIV
Copy link
Collaborator Author

There's the question whether the lockfile should e.g. retain information about the repos and their versions and whether reproducability of lockfiles is a goal but for now I would say one global copy of opam-repository with a single version is probably going to suffice and does not prevent us from implementing a more elaborate solution later on if there is demand.

I'll talk to OPAM folks about downloading the tarball vs git. The latter has the advantage of being on Github thus profiting from their CDN, will ask what the state of the tarball is (it's hopefully also CDN'ed).

@Alizter
Copy link
Collaborator

Alizter commented Jul 5, 2023

Just a reminder not to skip over other opam repositories like https://github.com/coq/opam-coq-archive and the OCaml beta one.

@gridbugs
Copy link
Collaborator

gridbugs commented Jul 5, 2023

If someone really wants reproducible lockdirs they could always clone the opam repo themselves and checkout a particular revision and then point dune at that. There are other factors that influence the resluting lockdir including flags/variables set in the build context or passed on the command line so I guess there's a question of whether to store all this information inside the lock.dune file in the lockdir so that lockdirs are reproducable. Kind of a lockfile for your lockfile. I'm not sure if it's something we want to fully support though as code changes in dune will also affect lockdir generation.

@Leonidas-from-XIV
Copy link
Collaborator Author

@Alizter I think just the default opam-repository is a decent starting point, but you're right, it should be customizable. In the near-future we might want to extend the dune-project file with some stanza to specify the repos to use for locking, as I could see this to be a useful addition (we added this as an extra OPAM field in opam-monorepo and it has been extremely useful).

Wrt to reproducibility, I think it is an excellent idea. The issue here is that it this information is unfortunately optional. The opam-repository tarball has a field stamp where I assume the git hash is stored and if we clone the repo from git we can get this info, but if the repo is retrieved as tarball without stamp field, we can't do much. Maybe include the tarball hash as the version of the repo? In any case, if we decide on how to determine the version of a repository I'm more than happy to implement this in a future PR so we can use the version as input for the lockfile generation.

bin/pkg.ml Outdated Show resolved Hide resolved
bin/pkg.ml Outdated Show resolved Hide resolved
@Leonidas-from-XIV Leonidas-from-XIV force-pushed the opam-repo-rules branch 2 times, most recently from fdc592a to c4e937a Compare July 11, 2023 08:43
@Leonidas-from-XIV
Copy link
Collaborator Author

Ok, freshly rebased and now featuring tests that show that it does download the file and store it in the dune cache.

It uses some somewhat exotic commands in the test like shuf and sleep with fractional numbers but these days these should be available in BSDs as well.

@Leonidas-from-XIV Leonidas-from-XIV force-pushed the opam-repo-rules branch 2 times, most recently from e804099 to 9a0bc99 Compare July 11, 2023 14:30
@Leonidas-from-XIV Leonidas-from-XIV changed the title Implement fetching of opam-repository repo feature(pkg): Automatic fetching of opam-repository Jul 11, 2023
@ocaml-benchmarks
Copy link

#8105 (7c71397) changes the metrics as follows in comparison to main (4256431) when running on fermat (bench/monorepo/bench.Dockerfile):

Benchmark: default

Test: dune monorepo benchmarks

  • build from scratch changed by 3.6%
  • null build changed by -4.9%
  • watch mode: changing file in 'base' library changed by 2.2%
  • watch mode: changing file in 'file_path' library changed by 3.7%
  • watch mode: fixing error in file in 'base' library changed by -1.9%
  • watch mode: fixing error in file in 'file_path' library changed by 6.1%
  • watch mode: introducing error in file in 'base' library changed by -14.8%
  • watch mode: introducing error in file in 'file_path' library changed by 1.2%

src/dune_pkg/fetch.mli Outdated Show resolved Hide resolved
@ocaml-benchmarks
Copy link

#8105 (9fcb056) changes the metrics as follows in comparison to main (a0145b2) when running on fermat (bench/monorepo/bench.Dockerfile):

Benchmark: default

Test: dune monorepo benchmarks

metric_name Last PR value Last main value Delta
build from scratch 1133.721632 1129.820496 0.3%
null build 67.882099 65.869692 3.1%
watch mode: changing file in 'base' library 70.082808 69.897738 0.3%
watch mode: changing file in 'file_path' library 18.840598 18.689723 0.8%
watch mode: fixing error in file in 'base' library 62.375925 63.562635 -1.9%
watch mode: fixing error in file in 'file_path' library 17.512562 17.633133 -0.7%
watch mode: introducing error in file in 'base' library 7.830276 8.154346 -4.0%
watch mode: introducing error in file in 'file_path' library 2.670346 2.827425 -5.6%

Signed-off-by: Marek Kubica <marek@tarides.com>
@rgrinberg rgrinberg merged commit fed9444 into ocaml:main Jul 16, 2023
@Leonidas-from-XIV Leonidas-from-XIV deleted the opam-repo-rules branch January 4, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants