Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to get all commit history for a single branch? #520

Open
emmahsax opened this issue Jun 5, 2021 · 9 comments
Open

Is there a way to get all commit history for a single branch? #520

emmahsax opened this issue Jun 5, 2021 · 9 comments

Comments

@emmahsax
Copy link

emmahsax commented Jun 5, 2021

I'm using this action like this (this works splendidly):

- name: Check Out Code
  uses: actions/checkout@v2
  with:
    ref: ${{ env.BRANCH }}
    fetch-depth: 0

I only need to get the entire commit history for just that one ${{ env.BRANCH }}. But using fetch-depth: 0 gets all commit history for all branches and tags, which is unnecessary in my case. Is there a way to get the entire commit history for just the branch referenced in the ref?

@solarmosaic-kflorence
Copy link

This is very painful in our ever growing monorepo.

@solarmosaic-kflorence
Copy link

See also #285 #346

@solarmosaic-kflorence
Copy link

solarmosaic-kflorence commented Nov 19, 2021

If you persist credentials, you can just check out a single commit, and then fetch whatever you want yourself. For example:

steps:
  - uses: actions/checkout@ec3a7ce113134d7a93b817d10a8272cb61118579
    with:
      ref: "${{ github.event.pull_request.head.sha }}"
  - run: git fetch --prune --progress origin +refs/heads/${{ github.event.pull_request.head.ref }}:refs/remotes/origin/${{ github.event.pull_request.head.ref }}

This will fetch all refs for the PR branch. This is a hack, though, and it require re-fetching objects and re-resolving deltas.

@solarmosaic-kflorence
Copy link

It would probably be better if there was just an action that set up credentials and let users run their own fetches.

@lourd
Copy link

lourd commented Dec 17, 2021

Here's the job yml for the workaround I just figured out for my use case of running tests and linting on pull requests on my team's Nx monorepo, in case anyone else finds this later with a similar need:

  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - name: Check out repository
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
      - name: Add PR base ref
        # Fetch the ref of the base branch, just the single commit.
        run: git fetch --depth=1 origin +refs/heads/${{github.base_ref}}:refs/remotes/origin/${{github.base_ref}}
      - name: Track PR base branch
        # Turn the just-fetched ref into a local branch.
        run: git branch --track ${{github.base_ref}} origin/${{github.base_ref}}
      - name: Fetch commits in-between base and HEAD
        # While the ancestor commit between HEAD and the branch base isn't in the local tree
        # fetch X more parent commits of the branch base and branch head.
        run: |
          while [ -z $( git merge-base ${{github.base_ref}} HEAD ) ]; do     
            git fetch --deepen=10 origin ${{github.base_ref}} HEAD;
          done
        # Sets the commit SHAs of head and branch base into two environment variables
        # that `nx affected` will read from.
      - name: Derive SHAs for base and head for `nx affected` commands
        uses: nrwl/nx-set-shas@v2
        with:
          main-branch-name: ${{github.base_ref}}
      - uses: actions/setup-node@v2
        with:
          node-version: 16.x
          cache: yarn
      - name: Install dependencies
        run: yarn install
      - name: Lint
        run: yarn nx affected --target=lint
      - name: Tests
        run: yarn nx affected --target=test

kdeldycke added a commit to kdeldycke/workflows that referenced this issue Jan 10, 2022
@polarathene
Copy link

polarathene commented Jun 27, 2022

Single branch history:

Assuming "all" history is from where the branch started/branched from:

Pull Requests

- name: 'PR commits + 1'
  run: echo "PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 1 ))" >> "${GITHUB_ENV}"

- name: 'Checkout PR branch and all PR commits'
  uses: actions/checkout@v3
    with:
      ref: ${{ github.event.pull_request.head.sha }}
      fetch-depth: ${{ env.PR_FETCH_DEPTH }}

NOTE: Prefer a ref of ${{ github.event.pull_request.head.sha }} (PR branch latest commit that triggered event) over ${{ github.event.pull_request.head.ref }} (PR branch). See end of this response for details on why you'd prefer head.sha over head.ref

This is a hack, though, and it require re-fetching objects and re-resolving deltas.

The overhead should be minimal AFAIK, especially with fetch-depth: 1 (default) prior. You can otherwise use the example above and set the fetch-depth via an ENV value.


Workflows triggered by events that aren't pull requests (but act on PR branches) - gh CLI with API requests

Click to view details

If you need this for a branch that is not a pull request triggered event, and lacks commit context, I have seen others use the Github CLI (their calculation differs from mine with expr, but the way I did it is what's often advised):

# fetch more commits
prCommits=`gh pr view $prId --json commits | jq '.commits | length'`
fetchDepthToPrBase=`expr $prCommits + 2`
git fetch --no-tags --prune --progress --no-recurse-submodules --deepen=$fetchDepthToPrBase

Other branches that aren't PRs

Click to view details

If the branch is not a PR but was branched off a base branch, you can fetch the commits for that branch and avoid the base branch history (advice taken from here):

git clone --shallow-exclude master --single-branch --branch my-branch git@github.com:<org>/<repo>.git

# or with fetch (first two steps should have been handled by `actions/checkout` already):
git init
git remote add origin https://github.com/user-name/repo-name
git fetch origin my-branch --shallow-exclude master

NOTE: That may not contain the full history since you branched off the base branch if the base branch had been merged into it since then, you'll have history up until that merge point.

If you want the shared history beyond when the branch started from the base (or if no base branch is involved), then just git fetch/git clone --single-branch without --depth or --shallow-exclude should work. Otherwise you could use --shallow-since=99years? (or if your version of git is new enough, you can use --refetch)

If you get a failure when trying to later --deepen the shallow clone or with other commands, you can possibly fix it with git repack -d.


For branch comparisons (such as getting a list of files changed):

Click to view details
- name: 'Checkout PR branch (with test merge-commit)'
  uses: actions/checkout@v3
    with:
      fetch-depth: 2
- name: 'Get a list of changed files to process'
  run: git diff-tree --name-only --diff-filter 'AM' -r HEAD^1 HEAD
  • That uses the default fetched "test merge-commit" (from PR to base branch) that Github generates and fetches the two parent commits involved in the merge (1 from each branch, 2nd depth level).
  • No merge-base is needed since you can diff between the merge-commit (HEAD) and the parent commit HEAD^1 (from base branch) and it's effectively a before/after difference.
  • This example is filtering change status of files by Added (A) and Modified (M), other differences are excluded (such as deletions).

If you need more history though, instead of the while loop deepen approach, for a PR branch to the base branch, you can leverage the Github context object for getting how many commits to fetch (1 more than the PR to get the commit the PR branched from and is a common ancestor with the base branch).

Note that any merge commits (base to PR branch) add to the count, but don't seem to be part of the fetched count, thus you may over-fetch some excess from N merge commits.

- name: 'Checkout PR branch'
  uses: actions/checkout@v3
    with:
      ref: ${{ github.event.pull_request.head.ref }}

- name: 'Get a list of changed files to process'
  run: |
    # Fetch enough history for a common merge-base commit
    git fetch origin ${{ github.event.pull_request.head.ref }} --depth $(( ${{ github.event.pull_request.commits }} + 1 ))
    # Fetch is smart enough to keep commits fetched minimal if it finds local history already has a commit from the base branch:
    git fetch origin ${{ github.event.pull_request.base.ref }}

    # Show only files from the PR with content filtered by specific git status (Added or Modified):
    git diff-tree --name-only --diff-filter 'AM' -r \
      --merge-base origin/${{ github.event.pull_request.base.ref }} ${{ github.event.pull_request.head.ref }}

There is a slight chance of the previous example does not fetch enough commits to derive a merge-base commit if you have used this action earlier in the workflow and fetched the ref for the test merge commit with a fetch-depth of 2 or more, since that contains a commit from the base branch in local history.

In that case you can still use this approach, but retrieve the date of the oldest commit (that is shared by both base and PR branches), and then fetch commits since that date which provides a common merge-base commit (unless there are merge commits from the base branch into the PR branch, in which case --merge-base / git merge-base will prefer as the merge-base):

- name: 'Checkout PR branch'
  uses: actions/checkout@v3
    with:
      ref: ${{ github.event.pull_request.head.ref }}

- name: 'Get a list of changed files to process'
  run: |
    # Fetch enough history for a common merge-base commit
    git fetch origin ${{ github.event.pull_request.head.ref }} --depth $(( ${{ github.event.pull_request.commits }} + 1 ))
    
    # This should get the oldest commit in the local fetched history (which may not be the branched base commit):
    BRANCHED_FROM_COMMIT=$( git rev-list --first-parent --max-parents=0 --max-count=1 ${{ github.event.pull_request.head.ref }} )
    UNIX_TIMESTAMP=$( git log --format=%ct "${BRANCHED_FROM_COMMIT}" )
    # Get all commits since that commit for the base branch (eg: master):
    git fetch --shallow-since "${UNIX_TIMESTAMP}" origin ${{ github.event.pull_request.base.ref }}

    # Show only files from the PR with content filtered by specific git status (Added or Modified):
    git diff-tree --name-only --diff-filter 'AM' -r \
      --merge-base origin/${{ github.event.pull_request.base.ref }} ${{ github.event.pull_request.head.ref }}

More details from a similar comment I wrote earlier


UPDATE: I have noticed that the github context object is "cached" to the commit triggering the event.

If new commits are added to the base branch, or the PR branch, the context object remains as it was for prior workflow runs when they happened (re-runs of those will use the same context object, falling out of sync with newer context metadata).

This means the "test merge commit" (PR into base branch) is not updated to use the latest base branch commit, nor is the contexts commit count accurate. Thus using the ref ${{ github.event.pull_request.head.ref }} on a workflow re-run would fetch the latest commit in a PR, and likely fail since the fetch-depth would be incorrect.

Here is a revised version that:

  • Checks out the PR head commit by it's SHA, and will always be that commit for workflow re-runs due to the cached context.
  • No ref/branch is checked out now, just the commit directly. I adjusted the git fetch to use refspec value to map to the remote PR branch. Checkout the branch with the git switch line if you need that.
  • Date is an ISO 8601 formatted value, instead of a unix timestamp format which may be nicer to work with if debugging.
  • Added the extra options the action includes with it's fetch commands.
  • Additionally restricted the fetched base branch commits too (swap the git fetch target refspec_base for branch_base if you want to always get the latest commits regardless).
- name: 'Checkout the latest commit of the PR branch (at the time this event originally triggered)'
  uses: actions/checkout@v3
    with:
      ref: ${{ github.event.pull_request.head.sha }}

- name:  'Example - Get a list of changed files to process (fetching full PR commit history)'
  env:
    branch_base: origin/${{ github.event.pull_request.base.ref }}
    branch_pr: origin/${{ github.event.pull_request.head.ref }}
    refspec_base: +${{ github.event.pull_request.base.sha }}:remotes/origin/${{ github.event.pull_request.base.ref }}
    refspec_pr: +${{ github.event.pull_request.head.sha }}:remotes/origin/${{ github.event.pull_request.head.ref }}
  run: |
    # Fetch enough history to find a common ancestor commit (aka merge-base):
    git fetch origin ${{ env.refspec_pr }} --depth=$(( ${{ github.event.pull_request.commits }} + 1 )) \
      --no-tags --prune --no-recurse-submodules

    # `actions/checkout` fetched a specific commit, not a branch (ref), so that commit was checked out in a
    # detached HEAD state. Depending on what you do, you may want to additionally switch to the branch
    # the refspec assigned the commit to in the prior fetch command:
    # git switch -c ${{ env.branch_pr }}

    # This should get the oldest commit in the local fetched history (which may not be the commit the PR branched from):
    COMMON_ANCESTOR=$( git rev-list --first-parent --max-parents=0 --max-count=1 ${{ env.branch_pr }} )
    DATE=$( git log --date=iso8601 --format=%cd "${COMMON_ANCESTOR}" )

    # Get all commits since that commit date from the base branch (eg: master or main):
    git fetch origin ${{ env.refspec_base }} --shallow-since="${DATE}" \
      --no-tags --prune --no-recurse-submodules

    # Example - Show only files from the PR with content filtered by specific git status (Added or Modified):
    git diff-tree --name-only --diff-filter 'AM' -r \
      --merge-base ${{ env.branch_base }} ${{ env.branch_pr }}

Meanwhile for the purposes of getting a diff, this earlier snippet needs no changes, and is much simpler than the above (only 3 commits involved):

- name: 'Checkout PR branch (with test merge-commit)'
  uses: actions/checkout@v3
    with:
      fetch-depth: 2
- name: 'Example - Get a list of changed files to process'
  run: git diff-tree --name-only --diff-filter 'AM' -r HEAD^1 HEAD

@floating-cat
Copy link

I use fetch-depth: 2147483647 to workaround this problem.
The depth number is learned from here https://stackoverflow.com/a/6802238

You can run git fetch --depth=2147483647

The special depth 2147483647 (or 0x7fffffff, the largest positive number a signed 32-bit integer can contain) means infinite depth.

@emmahsax
Copy link
Author

@floating-cat That's a cool solution... I'll give it a try!

@marc-hb
Copy link

marc-hb commented Dec 9, 2022

git fetch --unshallow https://git-scm.com/docs/git-fetch

BTW git merge-base is tempting and works most of the time - but NOT all the time:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants