feat: better cache support for JS workspaces #526

JGAntunes · 2021-02-22T16:47:20Z

Based on the discussion started in https://github.com/netlify/pod-workflow/issues/139 (which is in turn based on a lot of the open PRs and issues in this repo), this PR adds support for js workspaces.

Despite being a feature originally supported by yarn only, npm v7 is catching up, with on-going discussions and RFCs (see npm/rfcs#117 and npm/rfcs#273) to add further functionality. These however still work differently, npm is unable to detect it is within a project with multiple workspaces if we cd into a sub-directory. For netlify (if we use the base configuration property) our caching strategy still holds up, since we just install and cache a workspace package as it was a regular js project. For yarn however, installs in sub directories are able to infer they're part of a workspace, meaning dependencies will be hoisted and all the sibling packages will be installed also. This means that, in order to achieve the same caching experience we have for other JS projects, we need to cache the root node_modules as well as all the workspace node_modules.

This solution works for both yarn v1 and yarn v2 (that rely on node_modules nodeLinker config).

I've put in place an env variable - NETLIFY_YARN_WORKSPACES - that we can rely on to use as feature flag by hooking this up in the buildbot side with launch darkly.

Addresses #399 #432 #196 #479 #196

Tests

I've run this locally on a monorepo using:

An example output of a cached execution with yarn:

Finished restoring cached yarn cache
NETLIFY_YARN_WORKSPACES feature flag set
yarn workspaces v1.22.4
{
  "gatsby-starter-blog-1": {
    "location": "packages/blog-1",
    "workspaceDependencies": [],
    "mismatchedWorkspaceDependencies": []
  },
  "gatsby-starter-blog-2": {
    "location": "packages/blog-2",
    "workspaceDependencies": [],
    "mismatchedWorkspaceDependencies": []
  }
}
Done in 0.12s.
Started restoring workspace packages/blog-1 node modules
Finished restoring workspace packages/blog-1 node modules
Started restoring workspace packages/blog-2 node modules
Finished restoring workspace packages/blog-2 node modules
Started restoring workspace root node modules
Finished restoring workspace root node modules
Installing NPM modules using Yarn version 1.22.4
mkdir: cannot create directory ‘/opt/buildhome/tmp’: File exists
yarn install v1.22.4
[1/4] Resolving packages...
success Already up-to-date.
Done in 1.25s.
NPM modules installed using Yarn

(...)

Caching artifacts
Started saving workspace packages/blog-1 node modules
Finished saving workspace packages/blog-1 node modules
Started saving workspace packages/blog-2 node modules
Finished saving workspace packages/blog-2 node modules
Started saving workspace root node modules
Finished saving workspace root node modules
Started saving build plugins
Finished saving build plugins
Started saving yarn cache
Finished saving yarn cache
Started saving go dependencies
Finished saving go dependencies

…re the regular node_modules cache

erezrokah

🚀

erezrokah · 2021-02-22T17:37:11Z

run-build-functions.sh

+  then
+    echo "NETLIFY_YARN_WORKSPACES feature flag set"
+    # Ignore path env var required in order to support yarn 2 projects
+    if YARN_IGNORE_PATH=1 yarn workspaces info


[sand] Can we clarify a bit what YARN_IGNORE_PATH does?

Tried to had a better description. Let me know what you think @erezrokah 👍 The naming for the variable vs its final purpose can be a bit confusing 😅

@JGAntunes Are these env vars intended to be user-editable in the long term? If so, we'll need to document them. (cc @rstavchansky)

Is there anything else that should be documented? Or is this automatic behavior that's considered more "under the hood"?

Thanks @JGAntunes - so we are forcing yarn@1 just for the workspaces command?
And this works since the configuration hasn't changed between v1 and v2?

Are these env vars intended to be user-editable in the long term?
Is there anything else that should be documented? Or is this automatic behavior that's considered more "under the hood"?

The only new user-editable env var we're introducing is NETLIFY_YARN_WORKSPACES, and it shouldn't be editable in the long term, hence no need to document it IMO CC @rstavchansky. Theoretically speaking, users could set an environment variable NETLIFY_YARN_WORKSPACES=true that would trigger this behaviour (instead of relying on a feature flag on our side to do it - which is our plan to roll this out) but I don't really see that as major concern. The plan is for this behaviour to eventually become default without the need of any env vars (we infer the presence of a workspace monorepo on our end).

That being said, later on (maybe once this has been rolled out accordingly), I think it would be a good idea to produce some examples of monorepo usages with Netlify as it is not a trivial setup to do and, to my understanding, we don't have a write up on this (correct me if I'm wrong though). I think there's a lot of workarounds that originated in the issues above that are no longer applicable, so documenting that could probably be beneficial. Happy to move this discussion back to - https://github.com/netlify/pod-workflow/issues/139 - if you prefer 👍 CC @rstavchansky @verythorough

Thanks @JGAntunes - so we are forcing yarn@1 just for the workspaces command?
And this works since the configuration hasn't changed between v1 and v2?

Exactly @erezrokah! The configuration is still extracted from the workspaces entry in the package.json and that hasn't changed (from what I managed to look into), that way we are able to validate the presence of workspaces without the need of a separate branch for the yarn berry case.

Is that clear enough from the comment and usage? I'm happy to provide further information in there if you think like it makes sense 👍

erezrokah · 2021-02-22T17:37:55Z

(that rely on node_modules nodeLinker config).

Do users need to manually configure that?

JGAntunes · 2021-02-22T18:13:35Z

(that rely on node_modules nodeLinker config).

Do users need to manually configure that?

@erezrokah yes, but the purpose of this option is for users who are using berry (yarn v2) and want to opt-out of the default pnp strategy - or at least that's my understanding of it. For projects relying on the default yarn pnp strategy I believe there are further considerations and changes that need to happen to our build-image to be able to effectively cache those projects - https://github.com/netlify/build-image/pull/465/files#diff-6ad81dc23c4c92a6e6193db3b05e306293aa75ee25d48ae952c6259c601e8b23R117 - However I need to better look into it 👍

…handling of node_modules cache move out

run-build-functions.sh

JGAntunes · 2021-02-22T19:46:55Z

run-build-functions.sh

+  if [ "$NETLIFY_YARN_WORKSPACES" == "true" ]
+  then
+    cache_node_modules
+  else
+    cache_cwd_directory "node_modules" "node modules"
+  fi
+


The idea is to, once we can drop the feature flag, we keep only the cache_node_modules function containing all the logic relative to node_modules caching.

JGAntunes · 2021-02-22T19:56:25Z

run-build-functions.sh

+      #   (...)
+      # }
+      # We need to cache all the node_module dirs, or we'll always be installing them on each run
+      local package_locations=($(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location | @sh'| tr -d \'))


The idea here is to rely on the yarn workspaces info --json to give us the detailed list of locations where packages are located. That way we can cache all the relevant node_modules without needing to guess and run random searches.

🤔 I wonder if it would be useful to implement a similar check in the new site flow and/or framework-info library. I'm not sure if that's feasible, but it could be helpful for monorepo site setup.

I was about to write that it should be relatively simple to achieve it since we could rely directly in the package.json, but completely forgot that the workspaces entry will live in the root package.json which I guess won't be accessible out of the box for the framework-info... 😐

We can maybe open an issue in the repo and discuss it over there? CC @verythorough

run-build-functions.sh

ehmicky · 2021-02-26T14:21:35Z

run-build-functions.sh

+      #   (...)
+      # }
+      # We need to cache all the node_module dirs, or we'll always be installing them on each run
+      mapfile -t package_locations <<< "$(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location')"


YARN_IGNORE_PATH=1 yarn workspaces info is called in the test above.
Would it make sense to make the test use --json and keep the output, to avoid calling yarn workspaces info twice? The test would then check exit code using $?. As a tiny performance optimization.

Addressed via 073a3db 👍

Now, actually fixed via 8ab88ae 😅

Btw, I'm keeping the exit code in a local variable as a way to avoid hitting - https://github.com/koalaman/shellcheck/wiki/SC2181 - let me know what you think @ehmicky

Sounds good!

ehmicky · 2021-02-26T14:26:59Z

run-build-functions.sh

+      #   (...)
+      # }
+      # We need to cache all the node_module dirs, or we'll always be installing them on each run
+      mapfile -t package_locations <<< "$(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location')"


Would it work to directly set NETLIFY_JS_WORKSPACE_LOCATIONS instead of package_locations? This might require declaring NETLIFY_JS_WORKSPACE_LOCATIONS as a file-level array variable. This would remove the need for the intermediary variable package_locations.

Right now the cache logic and the respective usage of NETLIFY_JS_WORKSPACE_LOCATIONS is kind of isolated from the run_yarn function, I would say there's some use in keeping it that way? 🤔 That way we're keeping all the convoluted logic required to cache workspaces in the respective restore_js_workspaces_cache, cache_js_workspaces and cache_node_modules. The only thing the run_yarn is doing right is to pass the locations to the underlying functions ensuring it's decoupled from that. My concern is that if we end up spreading the NETLIFY_JS_WORKSPACE_LOCATIONS variable around we might end up with a tighter spaghetti bowl 😅

Makes sense 👍

run-build-functions.sh

ehmicky · 2021-03-01T17:14:10Z

run-build-functions.sh

+    # YARN_IGNORE_PATH will ignore the presence of a local yarn executable (i.e. yarn 2) and default
+    # to using the global one (which, for now, is alwasy yarn 1.x). See https://yarnpkg.com/configuration/yarnrc#ignorePath
+    workspace_output="$(YARN_IGNORE_PATH=1 yarn workspaces --json info )"
+    workspace_exit_code=$?


[sand!]

Suggested change

workspace_exit_code=$?

local workspace_exit_code=$?

(And removing the declaration above)
(Same for workspace_output)

@ehmicky happy to do so for the workspace_exit_code however, for the previous one, declaring a local variable will actually override the exit code from the command we need to use afterwards - https://github.com/koalaman/shellcheck/wiki/SC2155

TIL.
Good catch 👍

I owe it all to shellcheck - https://github.com/koalaman/shellcheck/ - I would be doomed writing bash/sh without it 😂

ehmicky

🎉

Co-authored-by: ehmicky <ehmicky@users.noreply.github.com>

JGAntunes added 2 commits February 19, 2021 19:40

feat: initial work to add support for js workspaces

5e24377

fix(ws-cache): addressing some typos and moving the caching call befo…

620c1ce

…re the regular node_modules cache

JGAntunes requested review from vbrown608, ehmicky and erezrokah February 22, 2021 16:47

JGAntunes added the type: feature code contributing to the implementation of a feature and/or user facing functionality label Feb 22, 2021

erezrokah previously approved these changes Feb 22, 2021

View reviewed changes

fix(ws-cache): using a tmp dir for workspace locations file & better …

4a61af7

…handling of node_modules cache move out

JGAntunes dismissed erezrokah’s stale review via 4a61af7 February 22, 2021 19:06

JGAntunes commented Feb 22, 2021

View reviewed changes

run-build-functions.sh Outdated Show resolved Hide resolved

JGAntunes commented Feb 22, 2021

View reviewed changes

run-build-functions.sh Outdated Show resolved Hide resolved

JGAntunes marked this pull request as ready for review February 22, 2021 20:00

JGAntunes requested a review from a team as a code owner February 22, 2021 20:00

ehmicky reviewed Feb 23, 2021

View reviewed changes

run-build-functions.sh Outdated Show resolved Hide resolved

JGAntunes added init/monorepo-support theme/monorepos labels Feb 24, 2021

This was referenced Feb 24, 2021

Yarn workspaces support improvements #399

Closed

Support for monorepos, including lerna or yarn workspaces #196

Closed

JGAntunes added 2 commits February 24, 2021 19:47

fix(ws-cache): correctly handle dirs with spaces

ef73c84

fix(ws-cache): use a variable instead of tmp dir to keep ws locations

70c8a36

ehmicky reviewed Feb 25, 2021

View reviewed changes

run-build-functions.sh Outdated Show resolved Hide resolved

ehmicky reviewed Feb 25, 2021

View reviewed changes

run-build-functions.sh Outdated Show resolved Hide resolved

JGAntunes added 2 commits February 26, 2021 13:32

fix(ws-cache): no need for intermediate variables for ws locations

32f7b0d

chore(ws-cache): move cache_dir to the upper scope

14b1ea8

ehmicky reviewed Feb 26, 2021

View reviewed changes

JGAntunes added 2 commits February 26, 2021 15:59

chore(ws-cache): don't call yarn workspaces twice

073a3db

fix(ws-cache): ended up messing the jq command 🤦

8ab88ae

ehmicky reviewed Mar 1, 2021

View reviewed changes

run-build-functions.sh Outdated Show resolved Hide resolved

ehmicky reviewed Mar 1, 2021

View reviewed changes

ehmicky previously approved these changes Mar 1, 2021

View reviewed changes

JGAntunes dismissed ehmicky’s stale review via c4e3641 March 1, 2021 17:43

chore: typo in comment run-build-functions.sh

c4e3641

Co-authored-by: ehmicky <ehmicky@users.noreply.github.com>

ehmicky approved these changes Mar 1, 2021

View reviewed changes

JGAntunes merged commit a21ba04 into xenial Mar 8, 2021

JGAntunes deleted the feat/js-workspaces branch March 8, 2021 14:40

JGAntunes mentioned this pull request Mar 29, 2021

Add JS workspaces information netlify/build-info#8

Closed

erquhart removed the init/monorepo-support label Mar 31, 2021

This was referenced Apr 29, 2021

Remove yarn workspaces cache feature flag #554

Closed

Support for caching yarn workspaces' node_modules #479

Closed

Yarn workspaces are not properly cached #432

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: better cache support for JS workspaces #526

feat: better cache support for JS workspaces #526

JGAntunes commented Feb 22, 2021 •

edited

Loading

erezrokah left a comment

erezrokah Feb 22, 2021

JGAntunes Feb 22, 2021

verythorough Feb 22, 2021

erezrokah Feb 23, 2021

JGAntunes Feb 23, 2021

JGAntunes Feb 23, 2021

erezrokah commented Feb 22, 2021

JGAntunes commented Feb 22, 2021

JGAntunes Feb 22, 2021

JGAntunes Feb 22, 2021

verythorough Feb 22, 2021

JGAntunes Feb 24, 2021

ehmicky Feb 26, 2021 •

edited

Loading

JGAntunes Feb 26, 2021

JGAntunes Feb 26, 2021

ehmicky Mar 1, 2021

ehmicky Feb 26, 2021 •

edited

Loading

JGAntunes Feb 26, 2021

ehmicky Feb 26, 2021

ehmicky Mar 1, 2021 •

edited

Loading

JGAntunes Mar 1, 2021

ehmicky Mar 1, 2021

JGAntunes Mar 2, 2021

ehmicky left a comment

feat: better cache support for JS workspaces #526

feat: better cache support for JS workspaces #526

Conversation

JGAntunes commented Feb 22, 2021 • edited Loading

Tests

erezrokah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erezrokah commented Feb 22, 2021

JGAntunes commented Feb 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ehmicky Feb 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ehmicky Feb 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ehmicky Mar 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ehmicky left a comment

Choose a reason for hiding this comment

JGAntunes commented Feb 22, 2021 •

edited

Loading

ehmicky Feb 26, 2021 •

edited

Loading

ehmicky Feb 26, 2021 •

edited

Loading

ehmicky Mar 1, 2021 •

edited

Loading