-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use gitoxide for merging trees #5411
Conversation
Some crates use it from the workspace as if it had that feature, but the reason they compile is only due to compiling a crate higher up that sets the feature, which transitively affects the crate in question. When building `gitbutler-diff` for instance, the build will fail.
These options were related to the worktree, which won't take part in this kind of diff.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
This is the most expensive call as it's possible to trigger a lot of merges. Also improve performance by aborting the merge as early as it's known that there is a conflict.
if claimed_hunk == &Hunk::from(git_diff_hunk) | ||
|| claimed_hunk.intersects(git_diff_hunk) | ||
{ | ||
if claimed_hunk.intersects(git_diff_hunk) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double-check this logic. I think it's the same as before, but avoids a ton of hashing that would happen otherwise and slow things down to a crawl.
@@ -248,7 +257,10 @@ pub(crate) fn stack_series( | |||
// Reverse first instead of later, so that we catch the first integrated commit | |||
for commit in series.clone().local_commits.iter().rev() { | |||
if !is_integrated { | |||
is_integrated = check_commit.is_integrated(commit)?; | |||
is_integrated = commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to save time and be reasonable in comparison to what it was before. However, since the stack can have its own remote I am not sure if the check_commit
instance is configured correctly.
remote_commit_data: HashMap<CommitData, git2::Oid>, | ||
commits: &[VirtualBranchCommit], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this is probably fine - but the commits that are passed in here, are something deprecated that I plan to remove asap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am looking forward to seeing that work only done once, and what I understood is that stack_series
is the new version, and I was basically altering the old one. The few changes I made should be easy to port to be internal to stack_series()
(if this is what's needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo lets just merge it and evaluate
An attempt to improve performance by integrating
gix::Repository::merge_trees()
.In a single sample that didn't actually have merge conflicts, it was 1.8x of merge-ORT.
GitHub wrote about changing rebases from
git2
togit
withmerge-ORT
:So in theory, we could see a 15x * 1.8x = ~28x speedup in a best-case scenario.
However, this much improvement is probably not going to happen... let's see.- Actually it did happen, 28x on the first day, but it took a little tuning.Related to #5163.
Tasks
gitoxide
foris_integrated()
As follow-up
try to remove duplication of work in- This will happen naturally as the other path is deprecated.stack_series()
.merge_trees
in remaining placesData so far
Stuff that could still be faster
IsIntegrated
state isn't adjusted.list_virtual_commit_files()
could be usinggitoxide
to get faster tree-diffing and faster hunk generation. This needs time though as it's basically rewriting a core of GB, so should be its own PR.Performance Data
Performance log before any change
List virtual branches takes 86 minutes!
Here we see that it takes 3m15s to get the applied status, but the rest of the time is taken to do a lot of merges. In the screenshot, I stopped the trace after about an hour.
Performance after the change
TBD … but one thing is for sure - it's not 28 times faster (saying this with the CLI running for 16 minutes already with everything configured correctly, I think).
It clocked in at 1h5m which is still an improvement.
A partial profile I took while it was running reveals that the reason for the incredible increase in time consumption is also that the commit-integration check is now run twice: Once for the stack, once in the previous context. However, I didn't investigate if it's doing something new or if it's a repetition.
When it comes to the time spent on the actual merges, we see that only about 1/30th is spent on diffing trees and performing tree-edits.
The rest goes into blob-merges, where half the time is used to read blobs from the ODB, the other half is spent on diffing them. Once again, it only takes about 1/30th of the time to actually perform the merge.
Another fun-fact: one merge operation takes about 3.5s!
The best work is the work you don't have to do ;)
This time we told the merge operation to stop as soon as there is an unresolved conflict. This saves tremendous amounts of work and the total time is now down to 4m36s!
Now we have the commit-integrated checks times 2 like before, each of them costing nearly 2m, with 1m for getting the status.
And there we see that it's mostly hunk conversions and the computation of an MD5 hash.
Unfortunately the trace is cut-off so, as the hunks have to be computed as well.
Try to re-use 'is-integrated' work
Under the assumption that in my case, the stacks should not be anything new, then the commits it finds should mostly (?) be coming from the existing branches. We now try to avoid an actual integration check by looking up the commit (non-optimized) in a list of pre-computed virtual commits.
This brings down the time to 4m08 in total, and 1m12s for the stack series function. Note that this value also looks better because there is no tracing, and the output is faster, but it still takes 16s to debug print a huge structure.
Avoid hashing in favor of intersecting first
From my possibly flawed logic, it should be enough to roughly prune out hunks by intersections, and then do the expensive hashing. This cut down the time by another minute to 3m 7s.
Now the time to run the
is_integrated()
checks is down to just 24s, but the most time, 1m10s times 2, is spent listing virtual commit files and diffing trees. This can probably be done faster, while trying to avoid the duplication in the stack-series more.