Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show git diff signs in gutter #1623

Closed
wants to merge 3 commits into from

Conversation

sudormrfbin
Copy link
Member

Shows the changes to the current file symbolically in the gutter:

Without diagnostics With diagnostics
Screenshot_2022-02-07_00-06-27 Screenshot_2022-02-07_00-09-14

It looks a bit too cramped with diagnostics, so maybe the reserved space should be increased but that will make the gutter permanently too big (there's currently no way to "toggle" a gutter component). Changes are reflected only when the file is saved. I would like to show an accurate diff even if the file is unsaved, but this should be a good starting point.

The symbols are hard coded for now to keep the PR simple, and a follow up PR will make it configurable. The default might also need changing (though I would like to avoid too much bike shedding ;)

Supersedes #467.

helix-view/src/editor.rs Outdated Show resolved Hide resolved
helix-view/src/gutter.rs Outdated Show resolved Hide resolved
helix-view/src/gutter.rs Outdated Show resolved Hide resolved
helix-view/src/gutter.rs Outdated Show resolved Hide resolved
@p-e-w
Copy link

p-e-w commented Feb 17, 2022

Hi folks, I stumbled upon this PR by accident. As it happens, I implemented the exact same feature for the Micro editor a while ago. Here's how it looks:

Two things from Micro's implementation work quite nicely and might be worth considering for Helix as well:

  1. The gutter uses the character \u2594 (UPPER ONE EIGHTH BLOCK) to indicate deleted lines on the following line. Compared to the arrow or block glyphs from this PR, this makes it completely clear where code was removed, and also has the advantage of resembling a minus sign, something people already associate with removal.

  2. Unlike most diff gutter implementations, Micro does not ask Git (or other VCSs) for the diff. Instead, it has the concept of a "diff base", which is the text from which to diff, and the diff is computed by the editor. This has many advantages, like being so fast you can update the diff gutter on every keystroke (for small to medium sized files), and being easy to adapt to other VCSs (all you need is a way to ask the VCS for the committed state of the file). It also allows cool features like diffing from the saved state for files that aren't even part of a VCS repository.

You can try out the implementation by installing and running Micro, pressing Ctrl+E and entering the command set diffgutter on. Then open any file (Ctrl+O) from a Git repository.

Anyway, that's all. I absolutely love what you are building here. Have a great day!

@sudormrfbin
Copy link
Member Author

sudormrfbin commented Feb 18, 2022

Hi @p-e-w, thanks for reaching out ! The upper one eighth block looks perfect for deletion and I've replaced the previous symbol with it. Since the discussion on the the placement of diff gutter also kind of got stalled, I went with the left aligned blocks for the added and modified markers with the gutter itself placed to the left most edge, essentially mimicking your screenshot :)

I also like the idea of doing the diffing in the editor itself, especially if it can make update-as-you-type fast enough to be implemented. Currently it's updated only on save, which makes for a sub-optimal experience IMO. We already pull in the similar crate for computing diffs on reloading files from disk, so this wouldn't add any new dependency either. What do you think @archseer ?

@archseer
Copy link
Member

We can do the diffing, but you'd still need a way to query for the original file.

@sudormrfbin sudormrfbin force-pushed the git-diff-sign branch 2 times, most recently from f732123 to 5a9fc33 Compare March 9, 2022 17:46
@sudormrfbin
Copy link
Member Author

sudormrfbin commented Mar 9, 2022

The diff is now updated live, on every change. The main problem left to tackle is to bypass converting the document Rope to a string every time the diff needs to be updated (so practically on most keypresses).

@sudormrfbin sudormrfbin force-pushed the git-diff-sign branch 2 times, most recently from c96c98c to 0cd310b Compare March 10, 2022 14:11
@sudormrfbin
Copy link
Member Author

Diffing avoids string conversion now by wrapping a Rope, implemented in 0cd310b 🎉


impl<'a> RopeLine<'a> {
pub fn from_rope(rope: &'a Rope) -> Vec<Self> {
rope.lines().into_iter().map(RopeLine).collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: .lines() is not very efficient: cessen/ropey#25

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I read O(1) in the chunks iterator and assumed lines would also follow suit. @cessen how hard/big of a change would it be to make the lines iterator also O(1) ? I suppose if it was easy enough it would've already been added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's non-trivial, for sure. I have a pretty good idea how to go about it, but it amounts to a rewrite of the lines iterator, and the code will be pretty subtle. I do still want to get to it eventually, but it's definitely a pain, which is why I've put it off for so long, ha ha.

Are you running into performance issues with the current lines iterator?

Copy link
Member Author

@sudormrfbin sudormrfbin Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From casual use it seems snappy and I haven't noticed any performance issues, so this is probably not too serious of an issue :) But then again this is a hotpath since it's called on almost every keystroke, and I haven't run any actual benchmarks yet either.

Copy link
Contributor

@cessen cessen Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then again this is a hotpath since it's called on almost every keystroke

Synchronously?

Thinking about it more, I would expect the diffing itself to be expensive enough that if the Lines iterator is causing performance issues then the diffing would be as well (i.e. even if Lines was infinitely fast, you'd still have issues). So if this feature does turn out to cause performance problems (e.g. on large files or with large diffs), then rearchitecting it to work in a separate thread or something like that might be needed anyway.

EDIT: I see you're deadlining the diffing process. See my other comment.

};

let timeout = Duration::from_millis(250);
let diff = similar::capture_diff_slices_deadline(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading your code correctly, it looks like you're collecting all the lines into a Vec in order to pass them to capture_diff_slices_deadline(), so even though this has a deadline set you can still be bottle necked on collecting everything into a Vec.

Instead, you could use capture_diff_deadline() by wrapping the Rope in a type that implements Index and returns the lines through indexing. That way the entire procedure can be deadlined.

I could also make this even easier for you by implementing Hash on RopeSlice (which is really just an accidental omission on my part in Ropey).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with Index is that it requires the Output to be a reference, whereas each line is returned as a RopeSlice. This might depend on rust-lang/generic-associated-types-initiative#2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Right, I feel silly now. Yeah, that's unfortunate. I wonder if they'd accept a pull request to add a variant that takes a |usize| -> Item closure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only diff using similar initially and now use changesets to incrementally compute the diffs, so it's no longer executed on every keypress. It'll still be nice to optimize it using indexing since we don't want to stall on huge text files on startup.

@archseer
Copy link
Member

Thinking about this some more, it seems to me that we could compute changed ranges based on helix changesets (after maybe doing an initial diff against git?). We already have all the information needed to incrementally update the diff in the transaction rather than having to go through git and diffing full files every single time (these files could also be very large).

Comment on lines 8 to 14
impl Hash for RopeLine<'_> {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
for chunk in self.0.chunks() {
chunk.hash(state);
}
}
}
Copy link
Contributor

@cessen cessen Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now implemented Hash directly on Rope and RopeSlice (in the recently released Ropey 1.4.0), so this should be unnecessary now.

It's also worth noting that your implementation here is incorrect (not to criticize, just an interesting technical note), which I only noticed when diving into the trait to implement it myself:

&str's Hash implementation, which you're calling in the loop here, is purposefully implemented so that hashing a series of string slices yields a different hash than hashing the same data as a single concatenated string. Which means that if two Ropes have identical contents, but the chunks happen to be split up differently, your implementation will give two different hashes for them.

Instead, you actually need to call state.write(chunk.as_bytes()) in the inner loop.

The reason &str's Hash implementation works like that is because of #[derive(Hash)]. Imagine this struct:

#[derive(Hash)]
struct Person<'a> {
    first_name: &'a str,
    last_name: &'a str,
}

If concatenated strings gave the same hash, then someone named "Jon Athanbor" and "Jonathan Bor" would hash to the same value if the names were stored in lower case. (A contrived example, I know, but it's just to illustrate.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah usually you prefix each chunk with a hash of the len to solve for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know this; thanks for pointing it out ! Thanks for the Hash impl too :)

@poliorcetics
Copy link
Contributor

Thinking about this some more, it seems to me that we could compute changed ranges based on helix changesets (after maybe doing an initial diff against git?). We already have all the information needed to incrementally update the diff in the transaction rather than having to go through git and diffing full files every single time (these files could also be very large).

What about changes at a distance, like adding a "use std::path::Path;" with rust-analyzer from a use point hundreds of line down the line ? Is the information for those available ?

@archseer
Copy link
Member

Yes, all changes are always applied through a changeset/transaction.

@poliorcetics
Copy link
Contributor

Nice !

Comment on lines +707 to +727
let reverted_tx = transaction.invert(&old_doc);

// Consider H to be the text of the file in git HEAD, and B₁ to be
// the text when buffer initially loads. H → B₁ is the changeset
// that describes the diff between HEAD and buffer text. Inverting
// this produces B₁ → H, which is initially saved to `diff_changes`.
// In the next edit, buffer text changes to B₂. The transaction
// describes the change B₁ → B₂. Inverting this gives B₂ → B₁.
// Composing this with the saved `diff_changes` gives us
// (B₂ → B₁) → (B₁ → H) = B₂ → H. Whenever examining a change X₁ → X₂,
// we need to know the contents of the text at state X₁ to know where
// to apply the operations in the changeset. The advantage of inverting and
// composing this way instead of simply composing (which would give
// us H → B₂ in this example) is that we no longer need the HEAD text
// and we can instead use the current text in the buffer.
if let Some(changes) = self.diff_changes.take() {
let reverted_changes = reverted_tx.changes().clone();
let changes = reverted_changes.compose(changes);
self.diff_changes = Some(changes);
self.diff_with_base();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use changesets now, and use the approach described in the comment. It also has the advantage that we don't have to manage a cache of the file contents in HEAD since we don't need it to generate the diff.

This works fine for most of the cases except for deleting something and undoing it (there are some other rough corners, but they can be easily solved at the rendering level). Normally, composing a change and it's undo operation produces an empty changeset. But since we use inverted changes here, it doesn't apply cleanly.

I'll write a better explanation of the problem tomorrow, but it boils down to the fact that composing for example [Delete(5)] and [Insert("hello")] (note the order) does not produce an empty changeset, where as composing an insert and then a delete gives a changeset of [].

Relevant test case:

Patch for the test
:100644 100644 daf4a77e 00000000 M	helix-core/src/transaction.rs

diff --git a/helix-core/src/transaction.rs b/helix-core/src/transaction.rs
index daf4a77e..9aed26e8 100644
--- a/helix-core/src/transaction.rs
+++ b/helix-core/src/transaction.rs
@@ -579,6 +579,17 @@ mod test {
     use super::*;
     use crate::State;
 
+    #[test]
+    fn composing_delete_and_insert_leaves_empty_changeset() {
+        let rope = Rope::from("");
+        let mut delete = ChangeSet::new(&rope);
+        delete.delete(5);
+        let mut insert = ChangeSet::new(&rope);
+        insert.insert("hello".into());
+
+        let changes = delete.compose(insert);
+        assert_eq!(changes.changes, &[]);
+    }
     #[test]
     fn composition() {
         use Operation::*;
fn composing_delete_and_insert_leaves_empty_changeset() {
    let rope = Rope::from("");
    let mut delete = ChangeSet::new(&rope);
    delete.delete(5);
    let mut insert = ChangeSet::new(&rope);
    insert.insert("hello".into());

    let changes = delete.compose(insert);
    assert_eq!(changes.changes, &[]);
}

@archseer archseer linked an issue Apr 6, 2022 that may be closed by this pull request
@heliostatic heliostatic mentioned this pull request Apr 13, 2022
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
git2 = { version = "0.13", default-features = false }
Copy link
Contributor

@pickfire pickfire Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really want to use the c bindings here instead of something that is written in rust? Like gix but not sure if they support diff yet. It seemed to have the stuff we need from a quick look https://github.com/Byron/gitoxide/blob/main/git-repository/src/types.rs


#[derive(Debug, Default)]
pub struct Registry {
inner: HashMap<RepoRoot, Rc<RefCell<Git>>>,
Copy link
Contributor

@pickfire pickfire Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's a good idea to have this as Git given that it is called vcs, I thought it is supposed to be extensible to other vcs. Also, I wonder why is it Rc<RefCell? So we can edit?

Comment on lines +627 to +629
pub fn set_version_control(&mut self, vcs: Option<Rc<RefCell<helix_vcs::Git>>>) {
self.version_control = vcs;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do version_control_mut for this?

let changes = compare_ropes(diff_base, self.text()).changes().clone();
let changes = changes.invert(diff_base);
self.diff_changes = Some(changes);
drop(vcs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please leave a comment on the drop here?

@the-mikedavis the-mikedavis mentioned this pull request Apr 25, 2022
@the-mikedavis the-mikedavis added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 18, 2022
@univerz
Copy link

univerz commented Jul 11, 2022

it would be great if this could be easily toggled - it's a useful feature, but it's visually distracting to see it all the time.

@kirawi kirawi added A-helix-term Area: Helix term improvements S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 13, 2022
@sudormrfbin
Copy link
Member Author

Closing in favor of #3890. Thanks @pascalkuthe for picking it up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-helix-term Area: Helix term improvements S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show git status near line number
9 participants