-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Go modules with LFS committed files #10052
Conversation
Note that git lfs is already installed: dependabot-core/Dockerfile.updater-core Line 26 in 2b8bb15
However, we disable pulling all lfs files by default: dependabot-core/Dockerfile.updater-core Line 74 in 2b8bb15
I don't think that we'd want to enable pulling large files for all repo's by default, as it would result in a lot of unnecessary data being pulled in. We make some exceptions around pulling in large files for Yarn Berry for example:
Perhaps we could explore something similar here? Tbh, I have not seen lfs used in go modules that much, we support a fairly large number of go users and I think this is the first time its come up. What exactly are you pulling in that requires lfs? I would think that typically go modules are just source code and do not require lfs to be enabled for those paths? |
Hi @jurre ! I did not notice that about the core image. I've tracked1 that change all the way back to commit 35ea619 as part of #5833 to resolve the thread that begins in #1297 (comment). That solution would not work for us Gophers because Dependabot scripts have nowhere to intervene while the Go toolchain clones repositories and computes their hashes. The only way I see forward is to override the previously set Preferably, I'd do that for all runs of the I have never coded in Ruby, but I can try and copy-paste some configuration check with the help of AI to exclude large repositories that would otherwise suffer performance degradation3. Just point me at a similar commit in the Thank you for your swift reply 🦃 Footnotes |
Would turning the FLS setting on globally not cause every module not using lfs to have wrong hashes? |
No, that is not the case. Allow me to elaborate on that answer: All modules (whether with or without files committed to LFS) compute hashes in the same way: hashing all the files in the directory tree that has |
@jurre If you're unhappy with the solution, I'd like to answer further questions and iterate on it. Otherwise, I've synched this PR with the latest upstream Note, I clicked the "Sync" button on GitHub which merges. Alternatively, I can rebase if that is your preferred method. |
Oops my bad, I did not push the updated solution 😦 I've committed a version that automatically pulls large-files on the |
Hey apologies, I'd missed the mention. Doing it for only go is definitely an improvement. And so if I understand correctly, we need basically everything that's nested in root folder (as far as Dependabot is concerned) to pull in large files, not necessarily only the Basically, if I have some 5GB video committed to the repository in say an |
Note: Don't merge this pull request, we are monitoring the release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for monitoring (no merging)
Yes.
Exactly. Just to be thorough: in a lot of time from today, golang/go#42965 will be merged and released as a new feature of the Go toolchain. When that happens, Go modules (that is, Another clarification on modules with |
Turning GIT_LFS_SKIP_SMUDGE on enables automatic pulling by default. This default is respected by the Go toolchain when downloading module sources directly from git servers (e.g. GitHub). Git LFS support is important when Go modules embed large files using go:embed. Direct download is more common for private repositories but it is also possible for public repositories (which are often downloaded from the module-proxy). The Go toolchain uses the local git installation to fetch archives of modules from Git servers directly. Some companies commits large files using Git LFS which happen to be part of their Go module source tree. Without git-lfs installed, the local git installation does not automatically resolve those files which causes "checksum mismatch" against the committed go.sum.
What are you trying to accomplish?
My company commits large files using Git LFS which happen to be part of our Go module source tree.
Without
git-lfs
installed, Dependabot's local git installation does not automatically resolve those files, causesing "checksum mismatch" against the committedgo.sum
.Git LFS support is important when downloading Go modules directly from git repositories. Direct download is more common for private repositories. It is also a valid method to fetch module archives for public repositories, although those are often downloaded from the module-proxy.
Anything you want to highlight for special attention from reviewers?
While Git LFS may be relevant for other ecosystems, I've been heavily occupied with Go for the past several years. Thus, I'm unable to offer a qualified solution for all ecosystems. Nonetheless, I think this solution is very relevant for companies that rely on GitHub to host their internal codebase.
How will you know you've accomplished your goal?
While dependabot correctly detects a new version is available, when the repo commits large-files, the generated pull-request contains the wrong module hash, rendering it effectively meaningless.
I've used the following snippets to prove, and consistently test the problem and its solution:
We are interested in the value of
GoCheckSum
field because that is the value that gets written togo.sum
.Notice how the value changed when Git LFS is disabled.
P.S. I've created
danielorbach/dependabot-go_modules
to demonstrate the scenario. A more elaborate setup can be made to incorporate dependabot.Checklist