Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jj op restore didn't work in a colocated repo, was undone by jj git import. #922

Closed
ilyagr opened this issue Dec 20, 2022 · 28 comments · Fixed by #1700
Closed

jj op restore didn't work in a colocated repo, was undone by jj git import. #922

ilyagr opened this issue Dec 20, 2022 · 28 comments · Fixed by #1700
Assignees

Comments

@ilyagr
Copy link
Collaborator

ilyagr commented Dec 20, 2022

I accidentally did jj unsquash on the wrong commit in a repository co-located with git. I ended up in this state:
jj_undo_unsquash.tar.gz

I can't seem to undo it. jj op restore 89b2b- seems to work fine, but a subsequent jj log gives the
message "Rebased 1 descendant commits off of commits rewritten from git", and I'm returned to
the bad state.

I'm sure rebasing will fix it, but it seems that whatever jj git import (I assume) does here isn't very helpful.

@martinvonz
Copy link
Owner

Thanks for the report and the simple repro. I'll take a look at this (some day) since I'm already working on #864.

@martinvonz martinvonz self-assigned this Dec 20, 2022
@ilyagr
Copy link
Collaborator Author

ilyagr commented Dec 24, 2022

I ran into a very similar issue with amend. This time, I managed to turn it into a test. I put it into test_git_colocated.rs, so I copied some helper functions from test_rebase:

#[test]
fn test_git_colocated_amend_undo() {
    let mut test_env = TestEnvironment::default();
    let repo_path = test_env.env_root().join("repo");
    git2::Repository::init(&repo_path).unwrap();
    test_env.jj_cmd_success(&repo_path, &["init", "--git-repo", "."]);

    create_commit(&test_env, &repo_path, "a", &[]);
    create_commit(&test_env, &repo_path, "b", &["a"]);
    create_commit(&test_env, &repo_path, "toamend", &["a"]);
    create_commit(&test_env, &repo_path, "c", &["a"]);
    test_env.jj_cmd_success(&repo_path, &["edit", "toamend"]);
    // Test the setup
    insta::assert_snapshot!(get_log_output_branches_divergence(&test_env, &repo_path), @r###"
    o c
    | @ toamend
    |/  
    | o b
    |/  
    o a master
    o 
    "###);
    std::fs::write(test_env.set_up_fake_editor(), "").unwrap();
    test_env.jj_cmd_success(&repo_path, &["amend"]);
    insta::assert_snapshot!(get_log_output_branches_divergence(&test_env, &repo_path), @r###"
    o c
    | @ 
    |/  
    | o b
    |/  
    o a master toamend
    o 
    "###);
    test_env.jj_cmd_success(&repo_path, &["undo"]);
    // THE BUG HAPPENS HERE
    // The amended commit does get recreated, but the branches aren't moved there
    // and rebase isn't undone (I think). In any case, divergence happens and things aren't great.
    insta::assert_snapshot!(get_log_output_branches_divergence(&test_env, &repo_path), @r###"
    Working copy now at: ec3936753a4f (no description set)
    Added 0 files, modified 0 files, removed 1 files
    @ 
    | o c
    | | o b
    | |/  
    | o a master toamend !divergence!
    o |  !divergence!
    |/  
    o 
    "###);
}

fn get_log_output_branches_divergence(test_env: &TestEnvironment, repo_path: &Path) -> String {
    test_env.jj_cmd_success(
        repo_path,
        &["log", "-T", "branches if(divergent, \" !divergence!\")"],
    )
}

fn create_commit(test_env: &TestEnvironment, repo_path: &Path, name: &str, parents: &[&str]) {
    if parents.is_empty() {
        test_env.jj_cmd_success(repo_path, &["new", "root", "-m", name]);
    } else {
        let mut args = vec!["new", "-m", name];
        args.extend(parents);
        test_env.jj_cmd_success(repo_path, &args);
    }
    std::fs::write(repo_path.join(name), format!("{name}\n")).unwrap();
    test_env.jj_cmd_success(repo_path, &["branch", "create", name]);
}

Update 2: Also, it's not great that the jj log (!) output includes

Working copy now at: ec3936753a4f (no description set)
Added 0 files, modified 0 files, removed 1 files

@yuja
Copy link
Collaborator

yuja commented Dec 24, 2022

This is basically because jj undo doesn't undo branch export (just like it doesn't undo push to remote.)

jj undo restores the view to the previous state, then finish_transaction() exports refs based on that view. If finish_transaction() ran git::export_refs() against the new undo-ed mut_repo and the old self.repo.view() (as source of truth of git refs), branches would be unexported. It appears working, but I'm not sure if that's the right thing to do.

@ilyagr
Copy link
Collaborator Author

ilyagr commented Dec 24, 2022

That makes sense as the cause.

Ideally, jj should undo branch export (or rather, move the branches to where they were and export them) but not the push to remote. The @origin or @another_workspace branches should follow different rules.

I think, in fact, that's what it used to do not so long ago, but that behavior had other problems. I think it caused a lot of pushes to cause unexpected divergence (as opposed to undo causing it). At least, I recently started seeing a lot less of the divergence-after-push bugs. I'm not sure how hard it is to specify a behavior that consistently does the right thing.

@yuja
Copy link
Collaborator

yuja commented Dec 24, 2022

I think, in fact, that's what it used to do not so long ago, but that behavior had other problems. I think it caused a lot of pushes to cause unexpected divergence (as opposed to undo causing it).

It's not the same as before. Many of the divergence problems were afaik caused
by inconsistency between the view (which jj thinks git refs would be of that
state when the refs were previously synced) and the actual git refs.

What's unclear to me is, for colocated repo, whether the exported git refs
should be considered internal state that can be undone, or external state.
It probably makes sense that undoing jj git push restores the remote refs
locally (or internally), assuming the user would fix up the external state
accordingly. For the same reason, undoing jj git export in non-colocated repo
would restore the local refs without unexporting. However, in colocated repo,
git export/import is implicit, so weird things happen if refs are out of sync.

@ilyagr
Copy link
Collaborator Author

ilyagr commented Jan 8, 2023

This keeps bothering me. I just accidentally abandoned 23 commits and had to manually restore branches after I did jj undo. On the plus side, I could use git's reflog to do it 🤔 . I guess I could switch to normal jj repositories, but I find it so nice that tig, gitk, and VS Code's diff view and diff gutters just work.

Do you see any problem with having an option to jj git export after each jj op restore or jj undo in a colocated repository?

I'll add that I don't see a usecase for such an option being off, but I haven't thought about this carefully enough to be sure.

I might end up looking into this myself if you are busy with other things, but I thought I'd ask first. This is not a part of jj where I have a good understanding of what kind of things need careful handling.

@yuja
Copy link
Collaborator

yuja commented Jan 8, 2023

Do you see any problem with having an option to jj git export after each jj op restore or jj undo in a colocated repository?

It does. The problem is op restore also restores the last known git branches state, so there would be nothing to export after that (and the next jj invocation would naively find "new" git branches to import.)

I don't think we'll need an option to switch such behaviors. If we decide that, for colocated repo, exported local branches are managed state of jj, op restore should take care of that.

@ilyagr
Copy link
Collaborator Author

ilyagr commented Jan 8, 2023 via email

@yuja
Copy link
Collaborator

yuja commented Jan 8, 2023

Could it do a more forceful jj git export that exports every single branch regardless of whether it looks updated?

Maybe?, but it's probably easier to do git export compared to the previous (pre-undo/restore) state.

I don't think we'll need an option to switch such behaviors. If we decide that, for colocated repo, exported local branches are managed state of jj, op restore should take care of that..

I think that's my mental model, but I'm not sure if there are other usecases.

I tend to agree with you. It's annoying that I can't undo wrong named branch operation, and I occasionally make a mistake since I don't remember how to move named branches properly.

@martinvonz?

@ilyagr
Copy link
Collaborator Author

ilyagr commented Jan 8, 2023

it's probably easier to do git export compared to the previous (pre-undo/restore) state.

This seems wrong, at least sometimes. Let's say you jj abandon-d a branch tip. It your version, I think jj undo wouldn't return the branch to where it was. That may or may not cause a divergence, I'm unsure. What I'd expect it to do is to return the branch to where it was before the abandon, and that's what exporting all branches would achieve.

In fact, the situation I described in #922 (comment) seems identical to the one I described in the previous paragraph. (Though, I didn't get a divergence there)

One question that worries me is whether there are also situations in which exporting all branches would do the wrong thing and your suggestion would do the right thing.

@yuja
Copy link
Collaborator

yuja commented Jan 8, 2023

This seems wrong, at least sometimes. Let's say you jj abandon-d a branch tip. It your version, I think jj undo wouldn't return the branch to where it was.

Doesn't the diff (of git refs) bring the abandoned branch back to the previous point?

I don't think about that thoroughly, but if the view of exported branches reflects the actual git state (i.e. the pre-undo/restore version), git::export_refs() can calculate stuff to bring git refs to the target (undo/restore-d) state.

@ilyagr
Copy link
Collaborator Author

ilyagr commented Jan 8, 2023 via email

@martinvonz
Copy link
Owner

I'm also unsure what the right solution is. I had started looking into this a month ago or something, but then I got distracted by some work on the formatter. I'll try to get back to this soon, but if anyone else wants to spend time on it, feel free. I might try to spend some more time at least thinking about this tomorrow.

@martinvonz
Copy link
Owner

I'm still not sure what the right solution is, but I wanted to share some thoughts I've had about this.

I think exports and imports to the underlying Git repo are very similar to pushes and pulls, and we should make that obvious in the model. We currently store branches in a structure where we link remote branches to the local branch. Git refs are stored separately. Having Git refs stored separately means that we can import and store refs that are not branches (or tags), so it's at least a little bit useful for that reason. Still, I wonder if we should treat the Git repo more like a remote. I also wonder if we should restructure the View object so we have each remote's branches stored in the same kind of data structure as we store local branches in. They're currently stored something like this:

branches: {
  main: {
    local: "abc123"
    remotes: {
      "ilyagr": "abc456"
      "yuja": "def456"
    }
  }
}
git_refs: {
  "refs/heads/main": "def123"
}

I'm thinking it might be better like this:

branches: {
  main: "abc123"
}
remotes: {
  "ilyagr": {
    branches: {
      main: "abc456"
    }
  }
  "yuja": {
    branches: {
      main: "def456"
    }
  } 
}
git: {
    branches: {
      main: "def123"
    }
}

That makes the remotes and the Git repo look more similar to the local repo. When importing from the Git repo or pulling from a remote, we then apply the diff between the two structs, just like we do when we merge views locally because of concurrent operations.

@yuja
Copy link
Collaborator

yuja commented Jan 11, 2023

Is it strictly needed for colocated repo to import/export refs automatically? Maybe it's unavoidable to keep HEAD in sync, but I personally don't care for the named branches.

I tend to agree that export/import should be modeled in a similar way to push/pull, but that should only apply to explicit operation, I think.

@ilyagr
Copy link
Collaborator Author

ilyagr commented Jan 11, 2023

I would very much like for HEAD to be kept in sync automatically, I use that all the time. Beyond that, I haven't formed a reasoned opinion.

I did want to share a user experience report. I import a branch from git to jj once every day, when I do git pull upstream main. I will try to substitute jj git fetch --glob main --remote upstream for this and see if it works for me. Other than that, I only ever push branches from jj to git.

Dipping a bit into opinion territory, for my personal workflow, I'd be OK with requiring a command for an import to happen. A command to export would feel like extra friction when I want to use something like tig or gitk, but it might very well be tolerable and worth it for more predictable operation and a cleaner data model. Or there could be a setting to make it automatic.

I thinks it's possible there will be users that want the opposite: to use git commands mainly, and to only use jj for what it's best at. I'm not sure if we should encourage that, if it does ever happen.

@martinvonz
Copy link
Owner

martinvonz commented Jan 12, 2023

Is it strictly needed for colocated repo to import/export refs automatically? Maybe it's unavoidable to keep HEAD in sync, but I personally don't care for the named branches.

True, it's not really necessary. We could make it configurable. I think we have still have to figure out how it should behave for the case of explicit import/export pull/push.

I thinks it's possible there will be users that want the opposite: to use git commands mainly, and to only use jj for what it's best at. I'm not sure if we should encourage that, if it does ever happen.

I wouldn't encourage it, but I would like for it to work as smoothly as possible to make it easy for users to gradually transition, unless it's an unreasonable amount of work for us to maintain the code for that.

@ilyagr ilyagr changed the title jj op restore doesn't work, is undone by jj git import. jj op restore didn't work in a colocated repo, was undone by jj git import. Jan 17, 2023
martinvonz added a commit that referenced this issue Jan 31, 2023
I think this is the same bug as reported in #922, just simplified a
bit further. The branches in the repo actually look good after the
`undo` operation, but the reverted `master` branch doesn't get
exported to the git repo even though our recorded `refs/heads/master`
in the repo was moved back. Then the next automatic import on `log`
notices that the `master` branch in the git repo still points to the
new commit, and that commit becomes visible again.
martinvonz added a commit that referenced this issue Jan 31, 2023
I think this is the same bug as reported in #922, just simplified a
bit further. The branches in the repo actually look good after the
`undo` operation, but the reverted `master` branch doesn't get
exported to the git repo even though our recorded `refs/heads/master`
in the repo was moved back. Then the next automatic import on `log`
notices that the `master` branch in the git repo still points to the
new commit, and that commit becomes visible again.
martinvonz added a commit that referenced this issue Jan 31, 2023
I think this is the same bug as reported in #922, just simplified a
bit further. The branches in the repo actually look good after the
`undo` operation, but the reverted `master` branch doesn't get
exported to the git repo even though our recorded `refs/heads/master`
in the repo was moved back. Then the next automatic import on `log`
notices that the `master` branch in the git repo still points to the
new commit, and that commit becomes visible again.
ilyagr added a commit that referenced this issue Feb 4, 2023
Eventually, we should be able to rely on `jj op restore` and the `--multi`
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 4, 2023
Eventually, we should be able to rely on `jj op restore` and the `--multi`
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 4, 2023
Eventually, we should be able to rely on `jj op restore` and the `--multi`
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 4, 2023
…multi`

Eventually, we should be able to rely on `jj op restore` and the `--multi`
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 4, 2023
…`--multi`

Eventually, we should be able to rely on `jj op restore` and the `--multi`
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 5, 2023
…`--allow-large-revsets`

Eventually, we should be able to rely on `jj op restore` and the `--allow-large-revsets`.
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 5, 2023
…`--allow-large-revsets`

Eventually, we should be able to rely on `jj op restore` and the `--allow-large-revsets`.
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 5, 2023
…`--allow-large-revsets`

Eventually, we should be able to rely on `jj op restore` and the `--allow-large-revsets`.
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 6, 2023
…`--allow-large-revsets`

Eventually, we should be able to rely on `jj op restore` and the `--allow-large-revsets`.
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
ilyagr added a commit that referenced this issue Feb 6, 2023
…`--allow-large-revsets`

Eventually, we should be able to rely on `jj op restore` and the `--allow-large-revsets`.
argument should likely be removed. This is a temporary measure until we figure
out #922 and the like.

Fixes #571
@ilyagr
Copy link
Collaborator Author

ilyagr commented Mar 9, 2023

A relevant Discord discussion with a new plan. (It also links to another relevant Discord discussion): https://discord.com/channels/968932220549103686/969291218347524238/1083507667563184269

@ilyagr
Copy link
Collaborator Author

ilyagr commented Mar 25, 2023

I think I'll actually try to fix this, unless you already started or want to do it. The plan from Discord seems pretty good and clear; I just need to look up whether the way we'd write a view object to disk changed since the commit we need to revert.

Also, the workaround you mentioned in #1390 (comment) is pretty nice: jj git remote remove origin; jj git import; jj git remote add origin <URL>.

@martinvonz
Copy link
Owner

I haven't started, so please do. Fixing that should be a big step forward for anyone using colocated repos, and a small step forward for others too. Thank you!

I don't think anything significant has changed since that commit. Maybe we changed from protobuf to prost after that but I didn't check. It shouldn't matter much for the amount of work anyway.

@ilyagr
Copy link
Collaborator Author

ilyagr commented Mar 27, 2023

@martinvonz Before 4dfd765, the View object was saved via Thrift. That way doesn't seem to be supported anymore, unless I create a new OpStore just for this View and save the id of the view to look for.

We could go to what was happening before f3f41cc and save a pointer to the oplog entry of the last git export. However, as that commit description points out, that causes difficulty with garbage collection.

Should I create a "simple_view_store" interface for storing a single view object and a corresponding ProtoViewStore? All the functionality seems to be in private functions in proto_op_store.rs, but there's no API for it.

Is there a simpler way?

@martinvonz
Copy link
Owner

We've switched to prost now, so we should be able to write a serialized View object using that instead of Thrift. Feel free to make any parts of proto_op_store you need available. I think you might just need view_to_proto() and view_from_proto() from around here:

fn view_to_proto(view: &View) -> crate::protos::op_store::View {

@ilyagr
Copy link
Collaborator Author

ilyagr commented Mar 28, 2023

I'm a little worried about data races for this serialized View object, but the code before the commit I'm reverting didn't seem to worry about it. Is it locked in some way during jj git export (between the time we read it and the time we write it), so that another jj git import or jj git export can't execute at the same time?

@ilyagr
Copy link
Collaborator Author

ilyagr commented Mar 28, 2023

Also, for a status update: I have a draft of reverting that commit at https://github.com/ilyagr/jj/tree/export, and it indeed seems to fix this bug. I'm now thinking of bugs that are newly introduced by the fix, and then of tests and documentation to add. I think there is at least one new bug (see the change to test_git.rs) because jj git import also needs to update the last remembered git state. I'm working on that now.

I'm also wondering whether it's correct that jj git fetch (specifically, the fetch() function in git.rs) essentially does jj git import, but jj git push doesn't seem to do jj git export. But that's a separate issue.

@martinvonz
Copy link
Owner

You're right that the file wasn't locked before, but it's probably a good idea to add a lock. You can see how the working copy is locked here:

jj/lib/src/working_copy.rs

Lines 1144 to 1145 in 7aad2ae

let lock_path = self.state_path.join("working_copy.lock");
let lock = FileLock::lock(lock_path);

There are also issues related to interrupted imports/exports. For example, we should probably not update the this "last seen git state" file until after we've committed a transaction. Otherwise we might not detect changes that happened in the git repo if a transaction aborts. It seems like a smaller problem to detect the change twice (e.g. if we crash after finishing the transaction). But I'll leave the details to you :)

@ilyagr
Copy link
Collaborator Author

ilyagr commented Mar 29, 2023

There are also issues related to interrupted imports/exports. For example, we should probably not update the this "last seen git state" file until after we've committed a transaction.

My understanding (basically a guess) was that jj never changes git refs during normal operations (everything except for push, pull, import, and export), and only stores commits in the git repo (along with the refs to prevent git's garbage collection, which I'm ignoring here). Is that not correct? This "last seen git state" only tracks refs (in fact, I'll rename it to "last seen git refs"), so I think the only problem is if the jj git import or jj git export get interrupted.

This is also the part of lockfiles that confuses me the most: how do you know if one is stale? We have plenty of such problems with working-copy.lock already.

You're right that the file wasn't locked before, but it's probably a good idea to add a lock.

This lock should logically apply to the part of the git repo state that isn't immune from conflicts (i.e., everything that's not adding content-addressed things). I was hoping there was already such a lock, but oh well. I might leave this as a TODO for the first bugfix.

(Also, thanks for the pointer. I might answer my own questions about the lockfile by looking at it).

@martinvonz
Copy link
Owner

My understanding (basically a guess) was that jj never changes git refs during normal operations (everything except for push, pull, import, and export), and only stores commits in the git repo (along with the refs to prevent git's garbage collection, which I'm ignoring here). Is that not correct?

Yes, that's correct. Of course, in a colocated repo, the import and export steps also happen automatically.

This lock should logically apply to the part of the git repo state that isn't immune from conflicts (i.e., everything that's not adding content-addressed things). I was hoping there was already such a lock, but oh well. I might leave this as a TODO for the first bugfix.

I think git has some lock file related to updating refs. We might be able to use that somehow, if that's what you mean.

ilyagr added a commit to ilyagr/jj that referenced this issue Apr 17, 2023
This commit reverts 8a440d8 AKA martinvonz@4dfd765

The problem that commit is that the Git repo is, unaffected by `jj undo`. So,
it is important for JJ's view of the git refs to be unaffected by `jj undo`.

Fixes martinvonz#922. Also creates another bug that is fixed in a follow-up commit.
@ilyagr
Copy link
Collaborator Author

ilyagr commented Apr 19, 2023

Following the discussion in and around #1487 (comment), I'm beginning to think that, in the long term, the approach we took in that PR is never going to match jj's concurrency model. Or, at least, I came up with a more principled approach. The trouble is that it seems like a big change; I'm not sure it has many short-term benefits or exactly what the long-term benefits are.

(In the short term, I think we should merge #1487 or something like it soon, to avoid users' pain from this bug)

I think we should split the view objects and the operation log in two. There will be the "local view", "local operation log", "remote-tracking view", and "remote operation log", all parts of the repo. They contain different information. The local view contains all the information that should be affected by undo, most notably the local branch positions and visible heads. The remote-tracking view contains all the information that shouldn't be affected by undo, most notably the remote tracking branches. These could be grouped by remote, and the last seen branch positions in the backing git repository would be treated just like remote tracking branches for a special remote1.

For concurrency purposes, both operation logs would be treated like the operation log is treated now. For undo purposes, only the "local" log would be affected. This would be mostly an aspect of the UI, not of the data model. For debug and testing purposes, undo could be implemented for the "remote-tracking" log; it would just usually lead to confusing user experience.

Otherwise, the local and remote operation logs are implemented just like the operation log is now. The "current" operation and the "current" view are the combination of the current operation in both operation logs.

The two logs only need to interact for operations like jj git import/export/push/fetch. For example, much like now, jj git import would be recorded in the local operation log. It would resolve conflicts between the jj state and the actual remote state using the state recorded in the "remote-tracking" view as a conflict base. After that, it would add an operation on the "remote" operation log updating the remote-tracking view to the last known state of the remote.

The reason for setting up two operation logs is that, as it seems to me, this is the only way to have proper support for concurrency for the remote-tracking info and to not have it be affected by undo. It would require more thought to describe specific advantages of proper support for concurrency in the remote log. The problem is that concurrent operation will still result in conflicts, they will merely be properly recorded.

Footnotes

  1. TODO: Decide whether/how to track the git repository's remote tracking branches (as opposed to jj's tracking branches for the git repository's local branches). This might make tracking the local git repo in jj's "remote view" a little different from tracking normal remotes. We might need to store the last-seen view of these remote-tracking branches (where "remote" has git's meaning) in our "remote-tracking view", for the local git repo "remote" (in jj's sense). So, git's remote tracking branches would be treated like local branches by jj's model, and only the UI would be different. I'm not sure how to have less confusing terminology for this.

    Alternatively, we could ignore git's remote-tracking machinery entirely, relying on jj's side for all of it instead. This makes things much simpler, and the "local git repo" becomes just another "remote" as far as JJ is concerned. The cost would be that it would be harder for the user to interleave git push and jj git push. This is annoying, especially for colocated repos and while jj git push has limitations with HTTPS, .ssh/config, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants