Make repo-level merging rebase descendants of rewritten/abandoned commits #111

martinvonz · 2022-03-09T16:15:56Z

Let's say you have commit A, which you rewrite as A'. You then add commit B on top. You then realize you want to undo the A->A' operation. Undoing that currently results in divergence because you get A back but A' is still visible because B is on top of it. I think we should rebase B onto A instead.

Another case is when you have pushed a branch to GitHub to create a PR. The PR then gets merged and the branch gets deleted from the remote. When you now jj git fetch from the remote, we pull down the new main branch and delete the old feature branch locally. However, we leave the commits in place and you have to manually abandon them. I think we should instead remove the commits and rebase any commits onto their closest visible ancestor.

These two cases are perhaps not obviously related, but I think they will both implemented in the repo-level merge. That means you'll get the same automatic rebasing from A' to A when you fetch that update from a remote, and the same automatic rebasing to ancestors if you undo an operation that added commits that your new branch is based on.

Note that the interaction with remotes will work much more smoothly once we can preserve Change IDs throughout the roundtrip to a remote. One option for achieving that would be to add the Change ID to the commit message like Gerrit does with its Change-Id: footer. If we do that, we'll want to make it optional so non-Jujutsu users won't be bothered by it. For now, we should do the automatic rebasing when we notice that a branch was moved in Git.

This is something I've meant to do for a year or something, but it's become more important to me personally now that I use GitHub PRs.

The text was updated successfully, but these errors were encountered:

martinvonz · 2022-03-09T16:37:02Z

A related cleanup I've meant to make is to the internal representation of branches and remotes. A branch currently knows where it points locally and on all remotes. I'm hoping to change that so the branch doesn't know about its target on remotes and to instead have a per-remote view object containing the branches on that remote.

After restructuring it like that, I think it will be easier and more natural to implement pull (fetch) as a 3-way repo-level merge: take the local repo's state and apply the diff between the remote's previously recorded state and the remote's new state. The same applies when importing from an underlying Git repo.

I'm documenting these thoughts here in the hope that it's useful to others, so please ask if the above didn't make any sense to you. I'll try to find time to document it in the repo as well some day...

I want to make it so when we apply a repo-level change that removes a head, we rebase descendants of that head onto the closes visible ancestor, or onto its successor if the head has been rewritten (see #111 for details). The view itself doesn't have enough information for that; we need repo-level information (to figure out relationships between commits). The view doesn't have enough information to do the.

…111) We had a few lines of similar code where we added a new of the operation log and then removed the old heads. By moving that code into a new type, we prepare for further refactorings.

It's useful for the UI layer to know that there's been concurrent operations, so it can inform the user that that happened. It'll be even more useful when we soon start making the resolution involve rebasing commits, since that's even more important for the UI layer to present to the user. This patch gets us a bit closer to that by moving the resolution to the repo level.

I want to make it so when we apply a repo-level change that removes a head, we rebase descendants of that head onto the closes visible ancestor, or onto its successor if the head has been rewritten (see #111 for details). The view itself doesn't have enough information for that; we need repo-level information (to figure out relationships between commits). The view doesn't have enough information to do the.

…111) We had a few lines of similar code where we added a new of the operation log and then removed the old heads. By moving that code into a new type, we prepare for further refactorings.

It's useful for the UI layer to know that there's been concurrent operations, so it can inform the user that that happened. It'll be even more useful when we soon start making the resolution involve rebasing commits, since that's even more important for the UI layer to present to the user. This patch gets us a bit closer to that by moving the resolution to the repo level.

Certain commands should never rewrite commits, or they take care of rebasing descendants themselves. We have an optimization in `commands.rs` for those commands, so they skip the usual automatic rebasing before committing the transaction. That's risky to have to remember and `MutableRepo` already knows if any commits have been rewritten (that wasn't the case before, in the Evolution-based code). So let's just have `MutableRepo` do the check instead.

I want to make it so when we apply a repo-level change that removes a head, we rebase descendants of that head onto the closes visible ancestor, or onto its successor if the head has been rewritten (see #111 for details). The view itself doesn't have enough information for that; we need repo-level information (to figure out relationships between commits). The view doesn't have enough information to do the.

…111) We had a few lines of similar code where we added a new of the operation log and then removed the old heads. By moving that code into a new type, we prepare for further refactorings.

It's useful for the UI layer to know that there's been concurrent operations, so it can inform the user that that happened. It'll be even more useful when we soon start making the resolution involve rebasing commits, since that's even more important for the UI layer to present to the user. This patch gets us a bit closer to that by moving the resolution to the repo level.

Certain commands should never rewrite commits, or they take care of rebasing descendants themselves. We have an optimization in `commands.rs` for those commands, so they skip the usual automatic rebasing before committing the transaction. That's risky to have to remember and `MutableRepo` already knows if any commits have been rewritten (that wasn't the case before, in the Evolution-based code). So let's just have `MutableRepo` do the check instead.

) If we have recorded in `MutableRepo` that commits have been abandoned or rewritten, we should always rebase descendants before committing the transaction (otherwise there's no reason to record the rewrites). That's not much of a risk in the CLI because we already have that logic in a central place there (`finish_transaction()`), but other users of the library crate could easily miss it. Perhaps we should automatically do any necessary rebasing we commit the transaction in the library crate instead, but for now let's just have a check for that to catch such bugs.

This is yet another refactoring step to allow the UI layer to inspect and control how concurrent operations are resolved.

martinvonz · 2022-03-24T06:54:21Z

The same code is used for resolving conflicts between concurrent operations. For example, if you create commit B on top of commit A on one machine and rewrite commit A as A' on another machine, and you then rsync the two repos, you should now (with the commits I just pushed) see B automatically get rebased on top of A' next time you run a command.

Similarly, if you create commit C on top of B on one machine and abandon commit B on another machine, you'll see C get automatically rebased onto A (B's parent) next time you run a command.

That's quite unorthodox, but I think it's what the user wants. I think we just need to inform them that the rebase happened. I think it'll be neat when it behaves the same way whether you ran two concurrent commands on the same machine, on two different machines (after e.g. rsync), or on two different clones (after fetching from the other repo).

…111) We already have the operation objects, and keeping them will allow some cleanup (see next change).

It's the transaction's job to create a new operation, and that's where the knowledge of parent operations is. By moving the logic for merging in another operation there, we can make it contiuously update its set of parent operations. That removes the risk of forgetting to add the merged-in operation as a parent. It also makes it easier to reuse the function from the CLI so we can inform the user about the process (which is what I was investigating when I noticed that this cleanup was possible).

It's cheap to sort them, so let's have them always sorted, so all callers can rely on that order.

The function is now pretty simple, and there's only one caller, so let's inline it. It probably makes sense to move the code out of `repo.rs` at some point.

) This involved copying `UnresolvedHeadRepo::resolve()` into the CLI crate (and modifying it a bit to print number of rebased commit), which is unfortunate.

…111) We already have the operation objects, and keeping them will allow some cleanup (see next change).

It's the transaction's job to create a new operation, and that's where the knowledge of parent operations is. By moving the logic for merging in another operation there, we can make it contiuously update its set of parent operations. That removes the risk of forgetting to add the merged-in operation as a parent. It also makes it easier to reuse the function from the CLI so we can inform the user about the process (which is what I was investigating when I noticed that this cleanup was possible).

It's cheap to sort them, so let's have them always sorted, so all callers can rely on that order.

The function is now pretty simple, and there's only one caller, so let's inline it. It probably makes sense to move the code out of `repo.rs` at some point.

…111) I want to do the rebasing a bit later, so the CLI can do it and print how many commits got rebased.