-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize git fetch-depth #296
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The base ref was necessary to determine which commits exist on the pull request. The common ancestor of base ref and pull head could be found using git merge-base. However, many commits may exist between the base ref and this common ancestor. In some cases this could lead to a slow fetch of the base ref, because it requires a deep fetch. Instead of determining the commits on the pull request using git only, we can simply ask github about the commits on the pull request. Github can provide the number of commits and there is an API to retrieve data about the commits in a paged way. Sadly this is limited to 250 commits, but that should cover most cases. We can improve upon this in the future. Using the number of commits on the pull request we can fetch exactly those commits needed for cherry-picking. We can then retrieve all the commits on the PR to find the first and last commit shas. Finally we can cherry-pick the entire range of commits: `git cherry-pick firstCommitSha..lastCommitSha` Note that this completely removes the old backport script method, which most tests depend on. It means we have remove/disable most tests from this repo. In the meantime, backport-action-test provides a way to validate the correct behavior of this action.
It seemed that ranges behave strangely when cherry-picking in a shallow clone, but I was mistaken. The commit range `sha1..sha2` is a shorthand for `^sha1 sha2`, which means all commit reachable by sha2 but not reachable by sha1. That means that sha1 is excluded. So to make the cherry-pick work correctly with the shorthand (..) notation, we'd need to use something like: `git cherry-pick <firstCommitSha>^..<lastCommitSha>`, i.e. take all commits reachable from <lastCommitSha> but not reachable from the first parent of <firstCommitSha>. This would include the commit referenced by <firstCommitSha> as well. Instead of using ranges, we can pass all specific commits directly to the cherry-pick command. This isn't easier, but makes the command more explicit. We can always choose to rewrite this back to using the shorthand (..) notation.
Most of the tests in this repo have bothered me. They've restricted progress more than they have helped me make sure I can safely make changes. To make sure the action works correctly, I've mostly switched to github.com/korthout/backport-action-test. That repo verifies the main functionality, but does not cover all scenarios. I plan to expand these in the future. However, edge cases should still be covered by unit-tests. Most of the existing ones I'm deleting in this commit, because they restrict me. But it's also noteworthy that they don't really test the edge cases that they should be testing, nor are they doing this in a good way. I plan to re-introduce fast running unit tests at a later time. But at this time, I do not see any reason to keep any of the tests deleted in this commit. This commit also updates the README, because the integration tests (which were aptly called acceptance tests) have been completely removed. The README still covers how to run tests because we still have some unit tests that run fast and make sense to have.
This is no longer needed for anything. And this dependency is updated extremely often, which became very annoying. Happy to get rid of it.
Tested wth https://github.com/korthout/backport-action-test |
Backporting a pull request to 2 branches went from ~1m on |
Backporting a pull request to 1 branch went from ~5m to just 16s on
|
This was referenced Dec 10, 2022
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR improves the performance of the action's git fetches. For repositories with large histories, this can dramatically decrease the action's execution time.
closes #267
closes #269
Motivation
Since #162 this action supports shallow clones, which is the default when cloning a repo using actions/checkout. This means that the action fetches all the necessary git refs itself.
For repositories with large git histories (i.e. containing many commits), this increased the execution time of the action dramatically. The action would spend most of its time fetching this history, apparently also fetching common parts of the history multiple times.
What has changed?
A core mechanism had to be adjusted to reduce the number of commits that have to be fetched. This action previously determined the commits to cherry-pick by looking for the merge base between the pull request's head and base refs. However, it's unclear how many commits may already exist on the pull request's base (i.e. PR target) since the PR's head was branched from it. So, there's no way to specify a particular fetch depth that is optimal deterministically.
As an alternative, this PR uses the Github API to find these same commits. This allows the depth of the fetch used to discover all the necessary commits to cherry-pick to be reduced to
n+1
. An additional fetch with depth1
is still required for each backport target branch. This means that the fetch depth is now optimal.Why about those removed tests?
Many of the tests in this project have been bothering me for a longer time. They've restricted progress much more than they've helped to make it safe to change things. This became clear to me once more when I wanted to adjust the core of this action. So, I've removed many of the tests.
To ensure the action works correctly, I've mostly switched to using korthout/backport-action-test. That repo verifies the main functionality but does not cover all scenarios yet. I plan to expand these in the future. I also plan to re-introduce fast-running unit tests at a later time. But now, I do not see any reason to keep any of the deleted tests.