Yarn Berry: run git lfs pull before running yarn #5833

jtbandes · 2022-10-05T00:06:41Z

cc @jurre @defunctzombie
Run git lfs pull --include .yarn before using yarn for dependency updating. Yarn plugins stored with LFS can now be used.

Resolves LFS issues described in #1297, see example Dependabot PR at https://github.com/foxglove/studio/pull/4528

Tested with bin/dry-run.rb npm_and_yarn foxglove/studio --clone --dep rc-tree, which now updates yarn.lock instead of failing with an error.

Ran unit test with cd npm_and_yarn && bundle exec rspec spec/dependabot/npm_and_yarn/file_fetcher_spec.rb -e "with .yarn data stored in git-lfs"

Dockerfile

defunctzombie · 2022-10-05T03:35:48Z

Besides storing pre-built plugins using LFS. Another good use case for LFS with yarn projects is the offline cache: https://yarnpkg.com/features/offline-cache

npm_and_yarn/lib/dependabot/npm_and_yarn/file_updater/yarn_lockfile_updater.rb

jurre · 2022-10-05T07:41:08Z

Thanks for opening a PR so we have a place to discuss the approach.

I am curious if this setup is a common one, do you know of more projects following this approach of storing yarn assets in LFS?

jtbandes · 2022-10-05T23:16:50Z

I am curious if this setup is a common one, do you know of more projects following this approach of storing yarn assets in LFS?

You can check out renovatebot/renovate#6842 for some other people who are interested in LFS support for bumping dependencies.

We would probably want to inspect the yarn config in order to determine what path to pull, as the cache and plugin folders are configurable I believe

This is a good point. However, it seems like it would add a lot of complexity to this otherwise simple patch. Do you think it would make more sense to add a config option to dependabot.yml that allows the user to specify which paths need to be git lfs pulled? I would guess that for most yarn users, the default of pulling only .yarn would suffice. Would it be reasonable to only pull .yarn for now and wait to see if users actually request the behavior of pulling arbitrary custom paths?

defunctzombie · 2022-10-05T23:40:59Z

Not having a cache does seem to allow yarn to continue unlike not having the plugins.

FWIW there are open issues on the yarn repo around installing plugins automatically: yarnpkg/berry#4464 but no PR has been accepted yet.

jurre · 2022-10-07T11:47:00Z

Do you think it would make more sense to add a config option to dependabot.yml that allows the user to specify which paths need to be git lfs pulled?

No we're really trying to keep the config file as simple as possible and avoid any ecosystem specific configuration, especially if like in this case the setting can be inferred from the repo.

We have a bit of code that checks the config file for the cache path already, so I think we could just run some similar code in the repo content path we just fetched, parse the yaml and check what the correct path to pull is?

dependabot-core/npm_and_yarn/lib/dependabot/npm_and_yarn/file_updater.rb

Line 76 in ca5756f

YAML.load_file(".yarnrc.yml").fetch("cacheFolder", "./.yarn/cache")

Would it be reasonable to only pull .yarn for now and wait to see if users actually request the behavior of pulling arbitrary custom paths?

That doesn't sound unreasonable, but it would be nice if we can make it work out of the box?

Not having a cache does seem to allow yarn to continue unlike not having the plugins.

And plugins always go in .yarn/plugins? Not pulling the yarn cache correctly would cause some broken behavior I think though, because Dependabot now wouldn't add the right entry to the yarn cache.

Overall I think the approach here seems reasonable and I can't see it leading to any problems so I'd be inclined to accept this , it would be really nice if you could take a look at inferring the yarn cache path from the config file. Would also be great if we could add a test case for this somehow, but I'd have to dig into how those are set up for cloning currently

Going to run CI to ensure this doesn't break things unexpectedly

npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb

jtbandes · 2022-10-11T05:11:20Z

@jurre any further thoughts on this change? I've implemented some of your feedback, please let me know if there are any more changes you'd like to see.

jurre · 2022-10-11T07:03:36Z

Hey, so first of all thanks for those changes, I think this should do the trick!

One last thing I would like to request is that we add some tests around this, here is an example of the existing tests we have around cloning:

dependabot-core/common/spec/dependabot/file_fetchers/base_spec.rb

Lines 1323 to 1348 in 4bd7127

    
           context "with repo_contents_path" do 
        
             let(:repo_contents_path) { Dir.mktmpdir } 
        
             after { FileUtils.rm_rf(repo_contents_path) } 
        
             describe "#files" do 
        
               subject(:files) { file_fetcher_instance.files } 
        
               let(:contents) { "foo=1.0.0" } 
        
               # `git clone` against a file:// URL that is filled by the test 
        
               let(:repo_path) { Dir.mktmpdir } 
        
               after { FileUtils.rm_rf(repo_path) } 
        
               let(:fill_repo) { nil } 
        
               before do 
        
                 Dir.chdir(repo_path) do 
        
                   `git init .` 
        
                   fill_repo 
        
                   `git add .` 
        
                   `git commit --allow-empty -m'fake clone source'` 
        
                 end 
        
                 allow(source). 
        
                   to receive(:url).and_return("file://#{repo_path}") 
        
                 allow(file_fetcher_instance).to receive(:commit).and_return("sha") 
        
               end

Let me know if you need any help setting that up!

jtbandes · 2022-10-12T20:44:22Z

@jurre Yeah, some guidance around this would be helpful, since I am not familiar with rspec or this repo's test organization. For instance, are there any existing tests for file fetchers of specific package ecosystems? If I were to add this one would you imagine adding it to the base_spec.rb you linked, or making a new spec file specific to npm_and_yarn? Would you like to see a fixture that actually includes data stored in LFS or just a test that verifies that the git lfs pull command is run?

jurre · 2022-10-13T06:34:00Z

For instance, are there any existing tests for file fetchers of specific package ecosystems?

Yep, they all have ecosystem specific file fetcher specs, they'll be in <ecosystem>/spec/dependabot/<ecosystem>/file_fetcher_spec.rb. The go_mod ecosystem is somewhat relevant since it also uses a cloning strategy (vs using the GitHub API which we do for most ecosystems), but we don't perform any checks around this.

If I were to add this one would you imagine adding it to the base_spec.rb you linked, or making a new spec file specific to npm_and_yarn?

I'd add it to the npm_and_yarn one: https://github.com/dependabot/dependabot-core/blob/8d60d39e19bca9ba738e497fae31fbe0323386f3/npm_and_yarn/spec/dependabot/npm_and_yarn/file_fetcher_spec.rb

Would you like to see a fixture that actually includes data stored in LFS or just a test that verifies that the git lfs pull command is run?

I think the best way to go would be to set up a fixture that includes lfs data, and then verify that it is included in the fetched files

Here's a snippet of useful context on when the file fetcher uses the cloning strategy:

dependabot-core/common/lib/dependabot/file_fetchers/base.rb

Lines 38 to 42 in 4bd7127

    
           # Files are typically grabbed individually via the source's API. 
        
           # repo_contents_path is an optional empty directory that will be used 
        
           # to clone the entire source repository on first read. 
        
           # 
        
           # If provided, file _data_ will be loaded from the clone.

, so that means that when passing the FileFetcher a repo_contents_path, it'll clone the repo, otherwise use the API. In the case of Yarn Berry, we already make sure to always set that when actually running Dependabot, but for these tests you'll have to pass it a value (a tmpdir works best) manually

jtbandes · 2022-10-13T06:47:15Z

Thanks for the pointers, I will try it out tomorrow. Is it okay for the test to require network access? Otherwise I'm not sure if it will be possible to set up a repo that uses LFS but the files are hosted fully locally (probably possible, maybe not easy).

jurre · 2022-10-13T06:57:22Z

Is it okay for the test to require network access?

If we can avoid it then that'll be preferred, but if there's no way around it we do have some other tests accessing the network. Our existing cloning specs do use a local repo I believe, so hopefully we can repurpose that!

jtbandes · 2022-10-13T07:17:41Z

Since the whole point of LFS is that git is supposed to fetch the large files from a file server, making it work offline may require a custom adapter like this: https://github.com/sinbad/lfs-folderstore And setting this up requires some git config commands after cloning.

jurre · 2022-10-13T07:38:33Z

Ah I see, well, if it's not feasible to do it locally then we can pull from a remote. I'll move the repo into a dedicated org we have for these sort of fixtures once you have a working setup.

Thanks for working on this by the way! 🎉 🙇

jtbandes · 2022-10-13T17:56:19Z

@jurre I've created a fixture repo at https://github.com/jtbandes/dependabot-yarn-lfs-fixture and added the test. Please critique! 😃

jurre · 2022-10-13T18:24:27Z

Looks good at a cursory glance! It's getting a bit late on my end so I'll give this a more thorough look tomorrow 🙇

jtbandes · 2022-10-14T15:50:24Z

I've also initiated a transfer of the fixture repo to your account so you can move it to the appropriate place. Feel free to update the PR with any changes necessary.

jtbandes · 2022-10-17T21:55:37Z

@jurre just a quick ping on this, let me know if there is anything else I can do. Thanks for your repeated reviews!

npm_and_yarn/spec/dependabot/npm_and_yarn/file_fetcher_spec.rb

jtbandes · 2022-10-18T17:54:49Z

@jurre The new repo does not have the files properly tracked with Git LFS, so they're just text files: https://github.com/dependabot-fixtures/yarn-berry-lfs-fixture/blob/main/.yarn/releases/yarn-3.2.4.cjs (compare with https://github.com/jtbandes/dependabot-yarn-lfs-fixture/blob/main/.yarn/releases/yarn-3.2.4.cjs)

Luckily the test caught it 😃 I'll re-initiate the transfer so you can just use my repo.

npm_and_yarn/spec/dependabot/npm_and_yarn/file_fetcher_spec.rb

jtbandes · 2022-10-18T19:56:17Z

Looks like the rest of the test failures are some intermittent network issues.

bdragon · 2022-10-27T14:42:14Z

npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb

+            end
+            path
+          end
+        rescue StandardError


Do we want to return the path and carry on for any error?

What other options do you have in mind? I’m not sure what meaningful recovery could be done here…

Let me first better understand, which errors are you anticipating?

Here are the ones I can think of:

Dependabot::SharedHelpers::HelperSubprocessFailed raised by run_shell_command in super or here

NoMethodError if .yarnrc.yml is empty, in which case YAML.load_file will return false, which does not respond to fetch

Psych::ParseError if the YAML in the .yarnrc.yml is invalid

If we encounter an exception outside of these, I think we'd want to let it fail? @jurre might know better 😅

I think it's reasonable to continue without lfs pull in those cases rather than fail the update entirely?

You do raise a few good error cases (the empty or invalid .yarnrc.yml for example) that we should probably handle?

Related:

Fix test failure on main #6583 (comment)

It took a bit to track down why this started failing on main since the error gets swallowed... that means users who are hitting this failure scenario won't be told why...

bdragon · 2022-10-27T15:22:47Z

npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb

@@ -32,6 +32,26 @@ def self.required_files_message

      private

+      # Overridden to pull any yarn data or plugins which may be stored with Git LFS.
+      def _clone_repo_contents(target_directory:)


At least at one time in history, methods like this one in Dependabot::FileFetchers::Base were not meant to be overridden:

dependabot-core/common/lib/dependabot/file_fetchers/base.rb

Lines 176 to 178 in 6a9c745

#################################################

# INTERNAL METHODS (not for use by sub-classes) #

#################################################

This would be the first instance of a file fetcher subclass overriding it. Could we override public method #clone_repo_contents instead?

jurre · 2022-10-28T07:57:11Z

I'll try to find a bit of time to make the suggestions @bdragon shared today and then hopefully we can get this merged. Thanks for your work and patience on this @jtbandes!

Co-authored-by: Roman Shtylman <shtylman@gmail.com>

…yarn_lock

@bdragon

As pointed out by @bdragon, `YAML.load_file` returns false if the file exists but is empty. This handles that, and wraps all the calls up in a little helper method to avoid duplicating the logic everywhere.

jurre · 2022-11-01T20:43:52Z

Thanks, this is now live

jtbandes · 2022-11-01T21:28:30Z

Thanks for all your help!

jtbandes requested a review from a team as a code owner October 5, 2022 00:06

defunctzombie reviewed Oct 5, 2022

View reviewed changes

Dockerfile Show resolved Hide resolved

jurre reviewed Oct 5, 2022

View reviewed changes

npm_and_yarn/lib/dependabot/npm_and_yarn/file_updater/yarn_lockfile_updater.rb Outdated Show resolved Hide resolved

jurre reviewed Oct 7, 2022

View reviewed changes

npm_and_yarn/lib/dependabot/npm_and_yarn/file_fetcher.rb Outdated Show resolved Hide resolved

jtbandes requested a review from jurre October 7, 2022 16:52

jurre force-pushed the yarn-git-lfs-pull branch from 6f4a274 to ae11a76 Compare October 18, 2022 07:55

jurre reviewed Oct 18, 2022

View reviewed changes

npm_and_yarn/spec/dependabot/npm_and_yarn/file_fetcher_spec.rb Outdated Show resolved Hide resolved

jurre reviewed Oct 18, 2022

View reviewed changes

npm_and_yarn/spec/dependabot/npm_and_yarn/file_fetcher_spec.rb Outdated Show resolved Hide resolved

jurre requested a review from pavera October 18, 2022 08:04

jurre reviewed Oct 18, 2022

View reviewed changes

npm_and_yarn/spec/dependabot/npm_and_yarn/file_fetcher_spec.rb Outdated Show resolved Hide resolved

jurre force-pushed the yarn-git-lfs-pull branch from d5e2e0a to 5e50c4e Compare October 19, 2022 15:53

jurre requested a review from a team October 27, 2022 13:54

bdragon reviewed Oct 27, 2022

View reviewed changes

jurre force-pushed the yarn-git-lfs-pull branch 3 times, most recently from 2aaabad to 1808542 Compare October 28, 2022 11:46

jurre requested a review from bdragon October 28, 2022 11:46

jurre force-pushed the yarn-git-lfs-pull branch from 1808542 to d9f4a86 Compare October 28, 2022 12:44

bdragon approved these changes Oct 31, 2022

View reviewed changes

jtbandes and others added 10 commits October 31, 2022 17:24

Yarn Berry: run git lfs pull before running yarn

35ea619

Update Dockerfile

a009d41

Co-authored-by: Roman Shtylman <shtylman@gmail.com>

override _clone_repo_contents instead of running lfs pull in updated_…

7bf4ffc

…yarn_lock

get cacheFolder from .yarnrc.yml, make method private

f4a58c0

add test

1a16579

lint

184a835

Update fixture repository to use dependabot-fixtures

b9c3af1

Use transferred lfs fixture repo

4b567a2

Ensure empty yarnrc.yml does not cause issues

968075e

As pointed out by @bdragon, `YAML.load_file` returns false if the file exists but is empty. This handles that, and wraps all the calls up in a little helper method to avoid duplicating the logic everywhere.

Override public clone_repo_contents instead of private helper

e2a9631

jurre force-pushed the yarn-git-lfs-pull branch from d9f4a86 to e2a9631 Compare October 31, 2022 16:24

Merge branch 'main' into yarn-git-lfs-pull

5876957

pavera approved these changes Nov 1, 2022

View reviewed changes

jurre merged commit 868ef7d into dependabot:main Nov 1, 2022

jtbandes deleted the yarn-git-lfs-pull branch November 1, 2022 21:03

jtbandes mentioned this pull request Nov 18, 2022

Support git LFS renovatebot/renovate#6842

Open

pavera mentioned this pull request Nov 30, 2022

v0.214.0 #6238

Merged

danielorbach mentioned this pull request Jun 25, 2024

Support Go modules with LFS committed files #10052

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yarn Berry: run git lfs pull before running yarn #5833

Yarn Berry: run git lfs pull before running yarn #5833

jtbandes commented Oct 5, 2022 •

edited

Loading

defunctzombie commented Oct 5, 2022

jurre commented Oct 5, 2022

jtbandes commented Oct 5, 2022

defunctzombie commented Oct 5, 2022

jurre commented Oct 7, 2022 •

edited

Loading

jtbandes commented Oct 11, 2022 •

edited

Loading

jurre commented Oct 11, 2022

jtbandes commented Oct 12, 2022

jurre commented Oct 13, 2022

jtbandes commented Oct 13, 2022 •

edited

Loading

jurre commented Oct 13, 2022

jtbandes commented Oct 13, 2022 •

edited

Loading

jurre commented Oct 13, 2022

jtbandes commented Oct 13, 2022

jurre commented Oct 13, 2022

jtbandes commented Oct 14, 2022

jtbandes commented Oct 17, 2022

jtbandes commented Oct 18, 2022

jtbandes commented Oct 18, 2022

bdragon Oct 27, 2022

jtbandes Oct 27, 2022

bdragon Oct 27, 2022

jurre Oct 27, 2022 •

edited

Loading

jeffwidman Feb 3, 2023

bdragon Oct 27, 2022

jurre commented Oct 28, 2022

jurre commented Nov 1, 2022

jtbandes commented Nov 1, 2022

	#################################################
	# INTERNAL METHODS (not for use by sub-classes) #
	#################################################

Yarn Berry: run git lfs pull before running yarn #5833

Yarn Berry: run git lfs pull before running yarn #5833

Conversation

jtbandes commented Oct 5, 2022 • edited Loading

defunctzombie commented Oct 5, 2022

jurre commented Oct 5, 2022

jtbandes commented Oct 5, 2022

defunctzombie commented Oct 5, 2022

jurre commented Oct 7, 2022 • edited Loading

jtbandes commented Oct 11, 2022 • edited Loading

jurre commented Oct 11, 2022

jtbandes commented Oct 12, 2022

jurre commented Oct 13, 2022

jtbandes commented Oct 13, 2022 • edited Loading

jurre commented Oct 13, 2022

jtbandes commented Oct 13, 2022 • edited Loading

jurre commented Oct 13, 2022

jtbandes commented Oct 13, 2022

jurre commented Oct 13, 2022

jtbandes commented Oct 14, 2022

jtbandes commented Oct 17, 2022

jtbandes commented Oct 18, 2022

jtbandes commented Oct 18, 2022

bdragon Oct 27, 2022

Choose a reason for hiding this comment

jtbandes Oct 27, 2022

Choose a reason for hiding this comment

bdragon Oct 27, 2022

Choose a reason for hiding this comment

jurre Oct 27, 2022 • edited Loading

Choose a reason for hiding this comment

jeffwidman Feb 3, 2023

Choose a reason for hiding this comment

bdragon Oct 27, 2022

Choose a reason for hiding this comment

jurre commented Oct 28, 2022

jurre commented Nov 1, 2022

jtbandes commented Nov 1, 2022

jtbandes commented Oct 5, 2022 •

edited

Loading

jurre commented Oct 7, 2022 •

edited

Loading

jtbandes commented Oct 11, 2022 •

edited

Loading

jtbandes commented Oct 13, 2022 •

edited

Loading

jtbandes commented Oct 13, 2022 •

edited

Loading

jurre Oct 27, 2022 •

edited

Loading