Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

feat: better cache support for JS workspaces #526

Merged
merged 10 commits into from
Mar 8, 2021
Merged

Conversation

JGAntunes
Copy link
Contributor

@JGAntunes JGAntunes commented Feb 22, 2021

Based on the discussion started in https://github.com/netlify/pod-workflow/issues/139 (which is in turn based on a lot of the open PRs and issues in this repo), this PR adds support for js workspaces.

Despite being a feature originally supported by yarn only, npm v7 is catching up, with on-going discussions and RFCs (see npm/rfcs#117 and npm/rfcs#273) to add further functionality. These however still work differently, npm is unable to detect it is within a project with multiple workspaces if we cd into a sub-directory. For netlify (if we use the base configuration property) our caching strategy still holds up, since we just install and cache a workspace package as it was a regular js project. For yarn however, installs in sub directories are able to infer they're part of a workspace, meaning dependencies will be hoisted and all the sibling packages will be installed also. This means that, in order to achieve the same caching experience we have for other JS projects, we need to cache the root node_modules as well as all the workspace node_modules.

This solution works for both yarn v1 and yarn v2 (that rely on node_modules nodeLinker config).

I've put in place an env variable - NETLIFY_YARN_WORKSPACES - that we can rely on to use as feature flag by hooking this up in the buildbot side with launch darkly.

Addresses #399 #432 #196 #479 #196

Tests

I've run this locally on a monorepo using:

An example output of a cached execution with yarn:

Finished restoring cached yarn cache
NETLIFY_YARN_WORKSPACES feature flag set
yarn workspaces v1.22.4
{
  "gatsby-starter-blog-1": {
    "location": "packages/blog-1",
    "workspaceDependencies": [],
    "mismatchedWorkspaceDependencies": []
  },
  "gatsby-starter-blog-2": {
    "location": "packages/blog-2",
    "workspaceDependencies": [],
    "mismatchedWorkspaceDependencies": []
  }
}
Done in 0.12s.
Started restoring workspace packages/blog-1 node modules
Finished restoring workspace packages/blog-1 node modules
Started restoring workspace packages/blog-2 node modules
Finished restoring workspace packages/blog-2 node modules
Started restoring workspace root node modules
Finished restoring workspace root node modules
Installing NPM modules using Yarn version 1.22.4
mkdir: cannot create directory ‘/opt/buildhome/tmp’: File exists
yarn install v1.22.4
[1/4] Resolving packages...
success Already up-to-date.
Done in 1.25s.
NPM modules installed using Yarn

(...)

Caching artifacts
Started saving workspace packages/blog-1 node modules
Finished saving workspace packages/blog-1 node modules
Started saving workspace packages/blog-2 node modules
Finished saving workspace packages/blog-2 node modules
Started saving workspace root node modules
Finished saving workspace root node modules
Started saving build plugins
Finished saving build plugins
Started saving yarn cache
Finished saving yarn cache
Started saving go dependencies
Finished saving go dependencies

@JGAntunes JGAntunes added the type: feature code contributing to the implementation of a feature and/or user facing functionality label Feb 22, 2021
erezrokah
erezrokah previously approved these changes Feb 22, 2021
Copy link
Contributor

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

then
echo "NETLIFY_YARN_WORKSPACES feature flag set"
# Ignore path env var required in order to support yarn 2 projects
if YARN_IGNORE_PATH=1 yarn workspaces info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[sand] Can we clarify a bit what YARN_IGNORE_PATH does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to had a better description. Let me know what you think @erezrokah 👍 The naming for the variable vs its final purpose can be a bit confusing 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGAntunes Are these env vars intended to be user-editable in the long term? If so, we'll need to document them. (cc @rstavchansky)

Is there anything else that should be documented? Or is this automatic behavior that's considered more "under the hood"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JGAntunes - so we are forcing yarn@1 just for the workspaces command?
And this works since the configuration hasn't changed between v1 and v2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these env vars intended to be user-editable in the long term?
Is there anything else that should be documented? Or is this automatic behavior that's considered more "under the hood"?

The only new user-editable env var we're introducing is NETLIFY_YARN_WORKSPACES, and it shouldn't be editable in the long term, hence no need to document it IMO CC @rstavchansky. Theoretically speaking, users could set an environment variable NETLIFY_YARN_WORKSPACES=true that would trigger this behaviour (instead of relying on a feature flag on our side to do it - which is our plan to roll this out) but I don't really see that as major concern. The plan is for this behaviour to eventually become default without the need of any env vars (we infer the presence of a workspace monorepo on our end).

That being said, later on (maybe once this has been rolled out accordingly), I think it would be a good idea to produce some examples of monorepo usages with Netlify as it is not a trivial setup to do and, to my understanding, we don't have a write up on this (correct me if I'm wrong though). I think there's a lot of workarounds that originated in the issues above that are no longer applicable, so documenting that could probably be beneficial. Happy to move this discussion back to - https://github.com/netlify/pod-workflow/issues/139 - if you prefer 👍 CC @rstavchansky @verythorough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JGAntunes - so we are forcing yarn@1 just for the workspaces command?
And this works since the configuration hasn't changed between v1 and v2?

Exactly @erezrokah! The configuration is still extracted from the workspaces entry in the package.json and that hasn't changed (from what I managed to look into), that way we are able to validate the presence of workspaces without the need of a separate branch for the yarn berry case.

Is that clear enough from the comment and usage? I'm happy to provide further information in there if you think like it makes sense 👍

@erezrokah
Copy link
Contributor

(that rely on node_modules nodeLinker config).

Do users need to manually configure that?

@JGAntunes
Copy link
Contributor Author

(that rely on node_modules nodeLinker config).

Do users need to manually configure that?

@erezrokah yes, but the purpose of this option is for users who are using berry (yarn v2) and want to opt-out of the default pnp strategy - or at least that's my understanding of it. For projects relying on the default yarn pnp strategy I believe there are further considerations and changes that need to happen to our build-image to be able to effectively cache those projects - https://github.com/netlify/build-image/pull/465/files#diff-6ad81dc23c4c92a6e6193db3b05e306293aa75ee25d48ae952c6259c601e8b23R117 - However I need to better look into it 👍

run-build-functions.sh Outdated Show resolved Hide resolved
Comment on lines +709 to +715
if [ "$NETLIFY_YARN_WORKSPACES" == "true" ]
then
cache_node_modules
else
cache_cwd_directory "node_modules" "node modules"
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to, once we can drop the feature flag, we keep only the cache_node_modules function containing all the logic relative to node_modules caching.

# (...)
# }
# We need to cache all the node_module dirs, or we'll always be installing them on each run
local package_locations=($(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location | @sh'| tr -d \'))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to rely on the yarn workspaces info --json to give us the detailed list of locations where packages are located. That way we can cache all the relevant node_modules without needing to guess and run random searches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I wonder if it would be useful to implement a similar check in the new site flow and/or framework-info library. I'm not sure if that's feasible, but it could be helpful for monorepo site setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to write that it should be relatively simple to achieve it since we could rely directly in the package.json, but completely forgot that the workspaces entry will live in the root package.json which I guess won't be accessible out of the box for the framework-info... 😐

We can maybe open an issue in the repo and discuss it over there? CC @verythorough

run-build-functions.sh Outdated Show resolved Hide resolved
@JGAntunes JGAntunes marked this pull request as ready for review February 22, 2021 20:00
@JGAntunes JGAntunes requested a review from a team as a code owner February 22, 2021 20:00
run-build-functions.sh Outdated Show resolved Hide resolved
run-build-functions.sh Outdated Show resolved Hide resolved
run-build-functions.sh Outdated Show resolved Hide resolved
# (...)
# }
# We need to cache all the node_module dirs, or we'll always be installing them on each run
mapfile -t package_locations <<< "$(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location')"
Copy link
Contributor

@ehmicky ehmicky Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YARN_IGNORE_PATH=1 yarn workspaces info is called in the test above.
Would it make sense to make the test use --json and keep the output, to avoid calling yarn workspaces info twice? The test would then check exit code using $?. As a tiny performance optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed via 073a3db 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, actually fixed via 8ab88ae 😅

Btw, I'm keeping the exit code in a local variable as a way to avoid hitting - https://github.com/koalaman/shellcheck/wiki/SC2181 - let me know what you think @ehmicky

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

# (...)
# }
# We need to cache all the node_module dirs, or we'll always be installing them on each run
mapfile -t package_locations <<< "$(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location')"
Copy link
Contributor

@ehmicky ehmicky Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to directly set NETLIFY_JS_WORKSPACE_LOCATIONS instead of package_locations? This might require declaring NETLIFY_JS_WORKSPACE_LOCATIONS as a file-level array variable. This would remove the need for the intermediary variable package_locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the cache logic and the respective usage of NETLIFY_JS_WORKSPACE_LOCATIONS is kind of isolated from the run_yarn function, I would say there's some use in keeping it that way? 🤔 That way we're keeping all the convoluted logic required to cache workspaces in the respective restore_js_workspaces_cache, cache_js_workspaces and cache_node_modules. The only thing the run_yarn is doing right is to pass the locations to the underlying functions ensuring it's decoupled from that. My concern is that if we end up spreading the NETLIFY_JS_WORKSPACE_LOCATIONS variable around we might end up with a tighter spaghetti bowl 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍

run-build-functions.sh Outdated Show resolved Hide resolved
# YARN_IGNORE_PATH will ignore the presence of a local yarn executable (i.e. yarn 2) and default
# to using the global one (which, for now, is alwasy yarn 1.x). See https://yarnpkg.com/configuration/yarnrc#ignorePath
workspace_output="$(YARN_IGNORE_PATH=1 yarn workspaces --json info )"
workspace_exit_code=$?
Copy link
Contributor

@ehmicky ehmicky Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[sand!]

Suggested change
workspace_exit_code=$?
local workspace_exit_code=$?

(And removing the declaration above)
(Same for workspace_output)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehmicky happy to do so for the workspace_exit_code however, for the previous one, declaring a local variable will actually override the exit code from the command we need to use afterwards - https://github.com/koalaman/shellcheck/wiki/SC2155

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL.
Good catch 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I owe it all to shellcheck - https://github.com/koalaman/shellcheck/ - I would be doomed writing bash/sh without it 😂

ehmicky
ehmicky previously approved these changes Mar 1, 2021
Copy link
Contributor

@ehmicky ehmicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Co-authored-by: ehmicky <ehmicky@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/monorepos type: feature code contributing to the implementation of a feature and/or user facing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants