Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git subtree crashes: can't sync rustc clippy changes into rust-lang/rust-clippy #5565

Open
matthiaskrgr opened this issue May 4, 2020 · 24 comments
Labels
A-infra Area: CI issues and issues that require full access for GitHub/CI

Comments

@matthiaskrgr
Copy link
Member

matthiaskrgr commented May 4, 2020

The command to sync rustc-changes into the clippy repo
crashes on my and @oli-obk s machine which means we can't sync clippy that way.
$ git subtree push -P src/tools/clippy git@github.com:matthiaskrgr/rust-clippy2 sync-from-rust

git push using:  git@github.com:matthiaskrgr/rust-clippy2 sync-from-rust
/usr/lib/git-core/git-subtree: line 757: 181726 Done                    eval "$grl"
     181727 Segmentation fault      (core dumped) | while read rev parents; do
    process_split_commit "$rev" "$parents" 0;
done

We need to figure out a way to work around that.

@matthiaskrgr matthiaskrgr added the A-infra Area: CI issues and issues that require full access for GitHub/CI label May 4, 2020
@flip1995
Copy link
Member

flip1995 commented May 4, 2020

The same happens for me, but I assumed that it was just my git version or something. The closest thing to find for this problem was this SO question: https://stackoverflow.com/questions/57033081/git-subtree-push-fail-applications-xcode-app-contents-developer-usr-libexec-gi

This mailinglist question is also relevant: https://lore.kernel.org/git/0F754615-C852-49D8-8E0C-DD2A00A15ED1@msys.ch/t/

@oli-obk
Copy link
Contributor

oli-obk commented May 4, 2020

We've had some discussion on discord. The problem is that it's running out of stack space. So ulimit -s 60000 works around that, but then every push takes an hour or so. I've created #5566 to do a quick update and explain the situation.

@flip1995
Copy link
Member

flip1995 commented May 4, 2020

https://public-inbox.org/git/20180928183540.48968-4-roger.strain@swri.org/

This is the subtree patch that caused this.

@matthiaskrgr
Copy link
Member Author

Perhaps we could move a copy of the git subtree script which has git/git@315a84f reverted into the clippy repo?

@flip1995
Copy link
Member

flip1995 commented May 4, 2020

Citing myself from Discord:

There is an attempt to fix the subtree problem here: gitgitgadget/git#493

But with those changes it also starts the recursion and ultimately segfaults, just on a different commit ...

But we can handle this recursion, since it only trips over commits from the Clippy repo (8500 commits can be searched in ~10-20min and we only have to do this once with the steps below)

I think, I found a solution:

  1. We need the above linked PR to get merged into git
  2. We need to git subtree pull in the rust repo
  3. We need to git subtree split --rejoin (this will work with ulimit -s 60000) in the rust repo
  4. Only PRs that are rebased onto the commit pulled in step 2. are allowed to get merged in Clippy

For future moves to subtrees step 4 has to be enforced for every sub-project.

The 4th step will prevent git subtree split (executed by push) to trip over Clippy commits in the future, since every commit will have a known parent.


At least that's how I understood the problem and solution from the mailing list and the git PR.

@oli-obk
Copy link
Contributor

oli-obk commented May 5, 2020

The problem with --rejoin is that we have to merge the synthetic commits back into the rustc repo. So we'd end up duplicating all commits that touched src/tools/clippy.

@flip1995
Copy link
Member

flip1995 commented May 5, 2020

Yeah, but the option is to wait half an hour for every sync in the future after gitgitgadget/git#493 is merged. Until then, we have to use the cp rust/src/tools/clippy/* . hack and can't use git subtree at all (well we can use pull, but that wasn't the reason for the switch to subtree).

I think we found what subrepo is trying to improve in comparison to subtree. It uses a .gitrepo file similar to .gitmodules, that contains metadata of the subrepo, so the script does not have to search through both repos history to find commits to backport (This feels more thought out, while subtree feels more like a sh script that was just hacked together). Not only improves this the process of syncing, but also makes the interface cleaner (no need to specify the remote+branch every time).

I see two practical options here, both include to declare the attempt to switch to subtree a failure:

  1. Go back to submodule.
    - Obvious downsides of the past are back
    - We now have the complete Clippy history in the Rust repo
  2. Use subrepo instead.
    - subrepo has to be installed for syncers
    + But we can (probably?) reuse the already merged Clippy history in the Rust repo and have the benefits we tried to get with subtree

  1. (Write something ourself in Rust 🦀)
    - We have to write it
    + RIIR 😄

@oli-obk
Copy link
Contributor

oli-obk commented May 5, 2020

Let's try to explore all non-submodule avenues first.

  • Subrepo loses all history. Several people strongly argueed against that.
  • subtree pull works nicely
    • even if we do use cp for pushing
  • In order to write something ourselves we first need to know what we want

So let's start with the "what we want" part.

We want

  • pushes to not take 30 minutes
  • pushes to not affect the rustc repo at all
  • (ideally) pulls not to change the commit hashes from clippy
  • pulls not to contain changes from rustc again, causing duplicate commits

The trivial solution (but losing history) is to check out clippy, copy src/tools/clippy into the checkout, and create a single sync-from-rustc commit. At the next pull, this commit will show up in the history, but not contain any changes. I'm testing it out righ tnow how that actually behaves in practice and will get back to you on that.

The complex solution is to write a "good" version of git subtree push, which actually figures out just the parts that need to be synced. I don't actually know if we can come up with a better algorithm than the existing one. So if we have

   _- 2 -_
 /         \
1 --- A --- B --- C --- D --- E --- F --- G --- H --- rustc repo
     /       \         /       \     \         /
    /         `-- X --´         \     `-- Y --´
   /             /               \       /
- M --- N ----- P --- Q --- R --- S --- T -------- clippy repo

If A is the original commit adding clippy to rustc and X is the previous clippy sync commit we want to sync E by creating the merge commit S, then we need to figure out that We don't know about B, C, D, X and E. 2 should not be able to have changed src/tools/clippy, if it had, we'd have needed a rebase.

This all assumes that 2 cannot have touched src/tools/clippy, and any history in 1 or before also didn't touch it in a meaningful way, because everything was overwritten by A. I'm guessing from the horrible amount of time that git subtree push takes, is that it doesn't properly handle that, and instead just walks backwards over 2 to 1 to the beginning.

Since we moved from a submodule to a subrepo, 2 cannot possibly contain any src/tools/clippy changes. I wonder if the reason git subrepo push dies is because it could?

So basically we can replay B, C, D, X and E on top of M, creating

   _- 2 -_
 /         \
1 --- A --- B --- C --- D --- E --- F --- G --- H --- rustc repo
     /       \         /       \     \         /
    /         `-- X --´         *     `-- Y --´
   /             /               \       /
- M --- N ----- P --- Q --- R --- S --- T -------- clippy repo
   \             \               /
    `-- B' ------ X' --- D' ----´
         \              /
          `----- C' ---´

where the ' commits contain their original commit's changes, but only those for src/tools/clippy and moved to the root of the repo...

And at this point I'm realizing that we'll have the duplicate synthetic commits back in the rustc repo anyway at the next merge (so at Y). I guess this is unavoidable, so maybe git subrepo push --rejoin will get faster once we have a clean merge point into rustc without any other PRs going over it?

@matthiaskrgr
Copy link
Member Author

I wonder if we could somehow generate patches from rustc /src/tools/clippy and apply these to the clippy repo one by one with git format-patch and git am

@matthiaskrgr
Copy link
Member Author

matthiaskrgr commented May 5, 2020

It seems to work if I manually fix the path inside the patch (we could have a tool that does that).
The commit hash is not the same but other metadata like author an date etc is carried over :)

EDIT: I have not tried this on merges though, hmm.

@oli-obk
Copy link
Contributor

oli-obk commented May 5, 2020

But we can handle this recursion, since it only trips over commits from the Clippy repo (8500 commits can be searched in ~10-20min and we only have to do this once with the steps below)

It's been running for over an hour for me now... The number it's showing is in the 50k range, not sure what that means. Maybe it just went into an infinite loop somewhere?

EDIT: I have not tried this on merges though, hmm.

Yea, I think the difficulty is figuring out the commit graph that we need to move and then recreate the branches by doing what you did with commits and then creating new merges just between these synthetic branches. I'm assuming that's how git subtree works, so not sure if we gain anything by reimplementing it other than that we can skip a few things like bors merges.

@oli-obk
Copy link
Contributor

oli-obk commented May 5, 2020

I think the root question is whether we care about seeing rustc history to clippy in the clippy repo.

@flip1995
Copy link
Member

flip1995 commented May 5, 2020

Subrepo loses all history.

Not in the Clippy -> Rust direction, and that is the important one. Nvm ingydotnet/git-subrepo#246 We lose the Rust -> Clippy direction with the cp method also.

This all assumes that 2 cannot have touched src/tools/clippy, and any history in 1 or before also didn't touch it in a meaningful way, because everything was overwritten by A.

IIUC that is what gitgitgadget/git#493 is trying to fix in subtree and it seems to work. However it can't figure out commits to the external repo (O), that were added during the git subtree add (B):

1 --- A --- B --- C --- rustc repo
           /
          /
         /
- M --- N --- P --- clippy repo
   \         /
    --- O ---

and, therefore, goes through every commit of the external repo.


The commit hash is not the same but other metadata like author an date etc is carried over

According to this Medium aritcle this should be possible with just git cherry-pick --strategy=subtree (interesting article, but if you don't want to read everything, just Ctrl+F "--strategy=subtree")


It's been running for over an hour for me now... The number it's showing is in the 50k range, not sure what that means. Maybe it just went into an infinite loop somewhere?

Have you installed the updated version from the PR I posted? This rather indicates, that it is half through the rust repo 😄 For me it has gone thorugh every commit on 19/150 with the updated script and on 1/150 without.

I think the root question is whether we care about seeing rustc history to clippy in the clippy repo.

It is not a prio IMO. Nice to have, but also ok if we lose that.

@flip1995
Copy link
Member

flip1995 commented May 5, 2020

Subrepo loses all history.

Well it loses the history, but we got the history through subtree already. Between the history and the subrepo, there would only be two commit:

  1. One for git rm -r src/tools/clippy/**; removing everything
  2. One for git subrepo clone src/tools/clippy: adding everything back in one commit

(Semi-serious comment)

This thought experiment is showcased here: rust-lang/rust@master...flip1995:subrepo_move

@oli-obk
Copy link
Contributor

oli-obk commented May 6, 2020

Ok, so I'm all for using the patched git, but the patch hasn't seen any action lately as far as I can tell. Since only clippy maintainers need it, we just need an instruction set on how to use the patched git (or maybe once we have it in a stable state we don't need the patched version anymore?)

The problem with --rejoin is that we have to merge the synthetic commits back into the rustc repo. So we'd end up duplicating all commits that touched src/tools/clippy.

I realized this happens anyway when you do git subtree pull, as the commits from clippy are taken verbatim to rustc, without any syntheticity magics, so my point is moot.

@flip1995
Copy link
Member

flip1995 commented May 6, 2020

hasn't seen any action lately as far as I can tell.

Yes sadly, but AFAICT only the documentation of the new commands is missing. Maybe we could take over and push this over the finish line?

or maybe once we have it in a stable state we don't need the patched version anymore?

I have a stable version here: https://github.com/flip1995/rust/tree/clippyup But also with this version the unpatched subtree script walks the whole rust-lang/rust tree. (The patched version pushes in about 10s: https://github.com/flip1995/rust-clippy/tree/rustup2)

@oli-obk
Copy link
Contributor

oli-obk commented May 6, 2020

ok, yea, let's try to revive that PR. Reading the repo docs makes me think that we need to interact with the mailing list in order to move it ahead, but maybe documenting it and openening a new PR will speed things up

@flip1995
Copy link
Member

flip1995 commented May 6, 2020

that we need to interact with the mailing list

Yeas git is managed over mailing list contributions, but gitgitgadget will let us use the GitHub workflow for most of the things.

@mati865
Copy link
Contributor

mati865 commented May 6, 2020

Yeas git is managed over mailing list contributions

I'm sorry, couldn't resist.
obraz

@josephlr
Copy link
Contributor

As a data point, with the patched git subtree #6178 took 50 minutes to generate on my desktop. However, once it was generated everything just worked (modulo having to run cargo dev fmt). So after the change gets merged, the current approach seems like a tradeoff between wall time and developer time.

@flip1995
Copy link
Member

Only the first time. Once it is cached, it is a matter of seconds. Still not optimal though.

@RalfJung
Copy link
Member

Cached where?

@bjorn3
Copy link
Member

bjorn3 commented Oct 16, 2020

.git/subtree-cache: https://github.com/gitgitgadget/git/blob/fe2e4819b869725f870cd3ce99f1f8150fe17dc1/contrib/subtree/git-subtree.sh#L218

@josephlr
Copy link
Contributor

Only the first time. Once it is cached, it is a matter of seconds. Still not optimal though.

Yup, running again only took 20 seconds. The .git/subtree-cache is large (about 515 MiB for me), make sense why it took so long to generate (still seems like there has to be a better way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-infra Area: CI issues and issues that require full access for GitHub/CI
Projects
None yet
Development

No branches or pull requests

7 participants