Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rev = "refs/pull/𑑛/head" #9859

Merged
merged 4 commits into from
Sep 7, 2021
Merged

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Aug 31, 2021

GitHub exposes a named reference associated with the head of every pull request. For example, you can fetch this pull request:

$ git fetch origin refs/pull/9859/head
$ git show FETCH_HEAD

Usually when I care about pulling in a patch of one of my dependencies using [patch.crates-io], this is the ref that I want to depend on. None of the alternatives are good:

  • { git = "the fork", branch = "the-pr-branch" } — this is second closest to what I want, except that when the PR merges and the PR author deletes their fork, it'll breaks.

  • { git = "the fork", rev = "commithash" } — same failure mode as the previous. Also doesn't stay up to date with PR, which is sometimes what I want.

  • { git = "the upstream", rev = "commithash" } — doesn't work until the PR is merged or the repo owner creates a named branch or tag containing the PR commit among its ancestors, because otherwise the commit doesn't participate in Cargo's fetch.

  • { git = "my own fork of PR author's repo", branch = "the-pr-branch" } — doesn't stay up to date with PR.

This PR adds support for specifying a git dependency on the head of a pull request via the existing rev setting of git dependencies: { git = "the upstream", rev = "refs/pull/9859/head" }.

Previously this would fail because the cargo::sources::git::fetch function touched in this pull request did not fetch the refspec that we care about. The failures look like:

error: failed to get `mockall` as a dependency of package `testing v0.0.0`

Caused by:
  failed to load source for dependency `mockall`

Caused by:
  Unable to update https://github.com/asomers/mockall?rev=refs/pull/330/head

Caused by:
  revspec 'refs/pull/330/head' not found; class=Reference (4); code=NotFound (-3)

If dual purposing rev for this is not appealing, I would alternatively propose { git = "the upstream", pull-request = "9859" } which Cargo will interpret using GitHub's ref naming convention as refs/pull/9859/head.

@rust-highfive
Copy link

r? @ehuss

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 31, 2021
@alexcrichton
Copy link
Member

Thanks for this! I've long-wanted to support this in Cargo and I'm glad to see it getting done!

Internally the main other alternative route to support this would probably be to add new syntax, something like ref = 'heads/...' or something like that. I'll note that internally Cargo is somewhat inconsistent about how it treats rev. This largely stems from the fact that when I originally wrote all this I didn't understand git well enough to know what I was doing. I suspect that the rev catch-all today is a bit more powerful that it's supposed to be. Originally it was supposed to just be sha or short-sha versions but through the use of revparse_single I suspect it allows many more syntaxes than are currently intended to support. I was basically surprised at how little changed here yet it still worked.

That being said I personally think I still don't know enough about refs in this regard, but perhaps you could enlighten me? For example we try to make common things in git part of the surface syntax of Cargo, enabling usage like branch = 'foo' rather than rev = 'refs/heads/foo (which I think would work too after this?). In general Cargo's understanding of remote repositories should probably be "fetch this thing in refs/..." rather than branch/tag since that's effectively what these all use under the hood. Basically my question is whether there's an actual abstraction to be had here? Instead of looking for refs/* is there a git-thing where it's expected that you prepend refs/ to the front and that's what you fetch from the remote? If there's some git-standard-terminology thing we could use here I think that would be better than retrofitting onto rev which is intended to be "revision" which in my head at least is the same as a git sha.

Additionally I think that the "github fast path" when we're updating a repository should be able to work with refs/heads/* since that's probably availble with the web API? Today, though, with this PR it's skipped. (although maybe GitHub doesn't support it after all)

@dtolnay
Copy link
Member Author

dtolnay commented Aug 31, 2021

https://git-scm.com/book/en/v2/Git-Internals-Git-References covers the relevant background on refs. The mental model is: a reference is a name given to an object (oid). The name is close to arbitrary — it must not contain ~, ^, :, \, ?, [, *, .., @{, //, but other than those any nonempty name is fair game.

There are some naming conventions in place for references. A name that begins with refs/heads/ is called a (local) branch, begins with refs/tags/ is called a tag, and begins with refs/remotes/ is called a remote-tracking branch. There is no technical difference between them. It's just conventions in the frontend to suite particular common workflows. Like if you git switch to a branch and create a new commit, it knows you want the branch repointed to your new commit, whereas if you switched to a tag or remote-tracking branch, you do not want either of those repointed.

Tools can come up with their own reference naming conventions beyond this. As mentioned, GitHub uses refs/pull/9859/head as the name for the most recent commit on a pull request. Gerrit uses refs/changes/70/263270/2 where the middle number is the change number, the first number is the last 2 digits of the change number (just to keep directory size sane for slow filesystems I guess), and the last number is the index of the patch set within that change.

You can make your own refs using git update-ref:

$ git update-ref refs/some/thing c22756541
$ cat .git/refs/some/thing
c22756541acb9e4cb428e9fbca53610edba1c501

Even refs/ is just a convention, but I don't know of any serious use case that uses references not starting with refs/ since it would probably confuse people.

$ git update-ref xyz/uiop c22756541
$ cat .git/xyz/uiop
c22756541acb9e4cb428e9fbca53610edba1c501

Instead of looking for refs/* is there a git-thing where it's expected that you prepend refs/ to the front and that's what you fetch from the remote?

I don't think so. Outside of the 3 naming conventions above (branches, tags, remote branches) it's just a string name. Branches and tags are already covered in Cargo.toml, and remote branches don't make sense for Cargo.toml because you would never fetch a remote-tracking branch from a remote repository. So for refs outside of those 3 groups, we need to either let you specify the full name of the ref (refs/pull/9859/head) or build in new heuristics like pull-request = "N"refs/pull/N/head and gerrit-change = "263270", patchset = "2"refs/changes/70/263270/2.

If we make Cargo take the arbitrary name of the reference, then even if all references in practice start with refs/ I think it would be more confusing than helpful if Cargo prepended that itself.

rev is intended to be "revision" which in my head at least is the same as a git sha

That's fair. I guess I put it in rev because of the revparse_single that you called out, which makes e.g. cargo = { git = "https://github.com/rust-lang/cargo", rev = "refs/remotes/origin/master" } already work in Cargo today. But as you say, it wasn't intended for this. Can we use ref = "refs/pull/9859/head" to match the naming of git commands like https://git-scm.com/docs/git-update-ref and https://git-scm.com/docs/git-symbolic-ref? Though at least in source id URLs that's already used as an alias for branch:

// Map older 'ref' to branch.
"branch" | "ref" => reference = GitReference::Branch(v.into_owned()),
"rev" => reference = GitReference::Rev(v.into_owned()),
"tag" => reference = GitReference::Tag(v.into_owned()),

Alternatively, reference = "refs/pull/9859/head".

Additionally I think that the "github fast path" when we're updating a repository should be able to work

Yep, judging by curl -H 'Accept: application/vnd.github.3.sha' https://api.github.com/repos/rust-lang/cargo/commits/refs/pull/9859/head I think it'll work. I'll update the PR to not skip the fast path.

@alexcrichton
Copy link
Member

Thanks for all that info! That definitely at least helps clear some stuff up for me.

It sounds like some-key = 'refs/...' is what we want to do here. I don't think we want GitHub specific stuff like pull-request = '...' (or for any other online service). As for some-key I don't really have a reason to not use rev, I mostly just wanted to consider it. Whatever my original intentions were don't really matter at this point, Cargo's used how it's used! I also didn't realize that ref from the url-encoding was already taken, I honestly forget as to why. For that though we could likely easily use reference to if we really wanted, it's mostly just a detail that shows up in lock files.

I'd be curious to hear from some other Cargo folks about ref vs rev here for the Cargo.toml syntax, but otherwise the only technical bit for this I think is to implement the fast-path is-this-up-to-date check. We've got a meeting in a few hours and I'll bring this up there.

@alexcrichton
Copy link
Member

Ok we talked about this in the Cargo team meeting and the conclusion was that rev is fine to use here. The thinking was that the difference between ref and rev is somewhat superficial and it's easy otherwise for Cargo to do the detection. One thing I thought of while we were talking is that I was worried that this could leak internal implementation details of Cargo like the name origin, but I think due to the way the refspec for fetching is created that shouldn't happen? I think we'd always be fetching a ref into a ref and looking up the same name so if it collided with some old fetch I think it'd just overwrite the previous ref (which is fine).

Can you also update the documentation for git dependencies too? Basically just call out that refs/... is usable, and I think the GitHub pull request example here is a good one to add there.

@dtolnay
Copy link
Member Author

dtolnay commented Sep 1, 2021

  • Added a paragraph to the "Specifying dependencies from git repositories" documentation to describe the use of rev with commit hashes and with refs. It should be helpful for the common case of people using a GitHub pull request, but points out that things may be different elsewhere.
  • I chose a regex PR that:
    • reflects positively on regex, i.e. not "we fixed major regression lol sorry"
    • was open for a while — since you wouldn't realistically use this functionality for a PR that was merged 2 minutes after open
    • is something that you would legitimately want to pull in as a Rust dependency, i.e. not CI tweaks
    • is by a core contributor instead of a random one-off contributor that we would disproportionately highlight
  • Rebase to deconflict with Swap out some outdated repo urls in documentation #9862

@dtolnay
Copy link
Member Author

dtolnay commented Sep 1, 2021

I was worried that this could leak internal implementation details of Cargo like the name origin, but I think due to the way the refspec for fetching is created that shouldn't happen?

Yeah you got it right. Basically what the user types is exactly the name of the ref on the remote, which is not an implementation detail of Cargo.

@alexcrichton alexcrichton added the T-cargo Team: Cargo label Sep 1, 2021
@alexcrichton
Copy link
Member

@rfcbot fcp merge

Looks good to me, thanks again!

@rfcbot
Copy link
Collaborator

rfcbot commented Sep 1, 2021

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period An FCP proposal has started, but not yet signed off. disposition-merge FCP with intent to merge labels Sep 1, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Sep 1, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period FCP — a period for last comments before action is taken and removed proposed-final-comment-period An FCP proposal has started, but not yet signed off. labels Sep 1, 2021
@ehuss
Copy link
Contributor

ehuss commented Sep 7, 2021

Thanks!

@bors r+

@bors
Copy link
Contributor

bors commented Sep 7, 2021

📌 Commit 86276d4 has been approved by ehuss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 7, 2021
@bors
Copy link
Contributor

bors commented Sep 7, 2021

⌛ Testing commit 86276d4 with merge 7d7c370...

@bors
Copy link
Contributor

bors commented Sep 7, 2021

☀️ Test successful - checks-actions
Approved by: ehuss
Pushing 7d7c370 to master...

@bors bors merged commit 7d7c370 into rust-lang:master Sep 7, 2021
@rfcbot rfcbot added finished-final-comment-period FCP complete to-announce and removed final-comment-period FCP — a period for last comments before action is taken labels Sep 11, 2021
Manishearth added a commit to Manishearth/rust that referenced this pull request Sep 12, 2021
Update cargo

6 commits in 18751dd3f238d94d384a7fe967abfac06cbfe0b9..e515c3277bf0681bfc79a9e763861bfe26bb05db
2021-09-01 14:26:00 +0000 to 2021-09-08 14:32:15 +0000
- Remove log output that may leak tokens (rust-lang/cargo#9873)
- rev = "refs/pull/𑑛/head" (rust-lang/cargo#9859)
- Update suggestion message on bad project name error (rust-lang/cargo#9877)
- clarify what goes into "*-sys" crates (rust-lang/cargo#9871)
- Improve error message when unable to initialize git index repo (rust-lang/cargo#9869)
- Use serde_json to generate cargo_vcs_info.json (rust-lang/cargo#9865)
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 13, 2021
Update cargo

6 commits in 18751dd3f238d94d384a7fe967abfac06cbfe0b9..e515c3277bf0681bfc79a9e763861bfe26bb05db
2021-09-01 14:26:00 +0000 to 2021-09-08 14:32:15 +0000
- Remove log output that may leak tokens (rust-lang/cargo#9873)
- rev = "refs/pull/𑑛/head" (rust-lang/cargo#9859)
- Update suggestion message on bad project name error (rust-lang/cargo#9877)
- clarify what goes into "*-sys" crates (rust-lang/cargo#9871)
- Improve error message when unable to initialize git index repo (rust-lang/cargo#9869)
- Use serde_json to generate cargo_vcs_info.json (rust-lang/cargo#9865)
@dtolnay dtolnay deleted the pullrequest branch September 16, 2021 03:05
@ehuss ehuss added this to the 1.57.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge FCP with intent to merge finished-final-comment-period FCP complete S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-cargo Team: Cargo to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants