Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize git fetch-depth #296

Merged
merged 5 commits into from
Nov 23, 2022
Merged

Optimize git fetch-depth #296

merged 5 commits into from
Nov 23, 2022

Conversation

korthout
Copy link
Owner

@korthout korthout commented Nov 22, 2022

This PR improves the performance of the action's git fetches. For repositories with large histories, this can dramatically decrease the action's execution time.

Note
Users do not have to adjust anything to benefit from this change unless they have configured the actions/checkout's fetch-depth to anything else than the default: fetch-depth: 1

closes #267
closes #269

Motivation

Since #162 this action supports shallow clones, which is the default when cloning a repo using actions/checkout. This means that the action fetches all the necessary git refs itself.

For repositories with large git histories (i.e. containing many commits), this increased the execution time of the action dramatically. The action would spend most of its time fetching this history, apparently also fetching common parts of the history multiple times.

What has changed?

A core mechanism had to be adjusted to reduce the number of commits that have to be fetched. This action previously determined the commits to cherry-pick by looking for the merge base between the pull request's head and base refs. However, it's unclear how many commits may already exist on the pull request's base (i.e. PR target) since the PR's head was branched from it. So, there's no way to specify a particular fetch depth that is optimal deterministically.

As an alternative, this PR uses the Github API to find these same commits. This allows the depth of the fetch used to discover all the necessary commits to cherry-pick to be reduced to n+1. An additional fetch with depth 1 is still required for each backport target branch. This means that the fetch depth is now optimal.

Why about those removed tests?

Many of the tests in this project have been bothering me for a longer time. They've restricted progress much more than they've helped to make it safe to change things. This became clear to me once more when I wanted to adjust the core of this action. So, I've removed many of the tests.

To ensure the action works correctly, I've mostly switched to using korthout/backport-action-test. That repo verifies the main functionality but does not cover all scenarios yet. I plan to expand these in the future. I also plan to re-introduce fast-running unit tests at a later time. But now, I do not see any reason to keep any of the deleted tests.

The base ref was necessary to determine which commits exist on the pull
request. The common ancestor of base ref and pull head could be found
using git merge-base. However, many commits may exist between the base
ref and this common ancestor. In some cases this could lead to a slow
fetch of the base ref, because it requires a deep fetch.

Instead of determining the commits on the pull request using git only,
we can simply ask github about the commits on the pull request. Github
can provide the number of commits and there is an API to retrieve data
about the commits in a paged way. Sadly this is limited to 250 commits,
but that should cover most cases. We can improve upon this in the
future.

Using the number of commits on the pull request we can fetch exactly
those commits needed for cherry-picking.

We can then retrieve all the commits on the PR to find the first and
last commit shas.

Finally we can cherry-pick the entire range of commits:
`git cherry-pick firstCommitSha..lastCommitSha`

Note that this completely removes the old backport script method, which
most tests depend on. It means we have remove/disable most tests from
this repo.

In the meantime, backport-action-test provides a way to validate the
correct behavior of this action.
It seemed that ranges behave strangely when cherry-picking in a shallow
clone, but I was mistaken. The commit range `sha1..sha2` is a shorthand
for `^sha1 sha2`, which means all commit reachable by sha2 but not
reachable by sha1. That means that sha1 is excluded.

So to make the cherry-pick work correctly with the shorthand (..)
notation, we'd need to use something like: `git cherry-pick
<firstCommitSha>^..<lastCommitSha>`, i.e. take all commits reachable
from <lastCommitSha> but not reachable from the first parent of
<firstCommitSha>. This would include the commit referenced by
<firstCommitSha> as well.

Instead of using ranges, we can pass all specific commits directly to
the cherry-pick command. This isn't easier, but makes the command more
explicit. We can always choose to rewrite this back to using the
shorthand (..) notation.
Most of the tests in this repo have bothered me. They've restricted
progress more than they have helped me make sure I can safely make
changes.

To make sure the action works correctly, I've mostly switched to
github.com/korthout/backport-action-test. That repo verifies the main
functionality, but does not cover all scenarios. I plan to expand these
in the future.

However, edge cases should still be covered by unit-tests. Most of the
existing ones I'm deleting in this commit, because they restrict me. But
it's also noteworthy that they don't really test the edge cases that
they should be testing, nor are they doing this in a good way.

I plan to re-introduce fast running unit tests at a later time. But at
this time, I do not see any reason to keep any of the tests deleted in
this commit.

This commit also updates the README, because the integration tests
(which were aptly called acceptance tests) have been completely removed.
The README still covers how to run tests because we still have some unit
tests that run fast and make sense to have.
This is no longer needed for anything. And this dependency is updated
extremely often, which became very annoying. Happy to get rid of it.
@korthout korthout mentioned this pull request Nov 22, 2022
9 tasks
@korthout korthout marked this pull request as ready for review November 23, 2022 17:03
@korthout korthout merged commit 85b0de7 into master Nov 23, 2022
@korthout korthout deleted the korthout-optimal-fetch-depth branch November 23, 2022 17:08
@korthout
Copy link
Owner Author

korthout commented Nov 28, 2022

Backporting a pull request to 2 branches went from ~1m on v0.0.9 to just 11s on v1-rc1 in the camunda/zeebe repo 🌱

@korthout
Copy link
Owner Author

Backporting a pull request to 1 branch went from ~5m to just 16s on v1-rc for the NixOS/nixpkgs repo 🌱 🌱 🌱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests don't run to completion on macOS Speed regression when using fetch-depth = 1
1 participant