-
Notifications
You must be signed in to change notification settings - Fork 196
feat: better cache support for JS workspaces #526
Conversation
…re the regular node_modules cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
run-build-functions.sh
Outdated
then | ||
echo "NETLIFY_YARN_WORKSPACES feature flag set" | ||
# Ignore path env var required in order to support yarn 2 projects | ||
if YARN_IGNORE_PATH=1 yarn workspaces info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[sand] Can we clarify a bit what YARN_IGNORE_PATH
does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to had a better description. Let me know what you think @erezrokah 👍 The naming for the variable vs its final purpose can be a bit confusing 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JGAntunes Are these env vars intended to be user-editable in the long term? If so, we'll need to document them. (cc @rstavchansky)
Is there anything else that should be documented? Or is this automatic behavior that's considered more "under the hood"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JGAntunes - so we are forcing yarn@1
just for the workspaces command?
And this works since the configuration hasn't changed between v1 and v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these env vars intended to be user-editable in the long term?
Is there anything else that should be documented? Or is this automatic behavior that's considered more "under the hood"?
The only new user-editable env var we're introducing is NETLIFY_YARN_WORKSPACES
, and it shouldn't be editable in the long term, hence no need to document it IMO CC @rstavchansky. Theoretically speaking, users could set an environment variable NETLIFY_YARN_WORKSPACES=true
that would trigger this behaviour (instead of relying on a feature flag on our side to do it - which is our plan to roll this out) but I don't really see that as major concern. The plan is for this behaviour to eventually become default without the need of any env vars (we infer the presence of a workspace
monorepo on our end).
That being said, later on (maybe once this has been rolled out accordingly), I think it would be a good idea to produce some examples of monorepo usages with Netlify as it is not a trivial setup to do and, to my understanding, we don't have a write up on this (correct me if I'm wrong though). I think there's a lot of workarounds that originated in the issues above that are no longer applicable, so documenting that could probably be beneficial. Happy to move this discussion back to - https://github.com/netlify/pod-workflow/issues/139 - if you prefer 👍 CC @rstavchansky @verythorough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JGAntunes - so we are forcing yarn@1 just for the workspaces command?
And this works since the configuration hasn't changed between v1 and v2?
Exactly @erezrokah! The configuration is still extracted from the workspaces
entry in the package.json
and that hasn't changed (from what I managed to look into), that way we are able to validate the presence of workspaces
without the need of a separate branch for the yarn berry
case.
Is that clear enough from the comment and usage? I'm happy to provide further information in there if you think like it makes sense 👍
Do users need to manually configure that? |
@erezrokah yes, but the purpose of this option is for users who are using |
…handling of node_modules cache move out
if [ "$NETLIFY_YARN_WORKSPACES" == "true" ] | ||
then | ||
cache_node_modules | ||
else | ||
cache_cwd_directory "node_modules" "node modules" | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to, once we can drop the feature flag, we keep only the cache_node_modules
function containing all the logic relative to node_modules
caching.
run-build-functions.sh
Outdated
# (...) | ||
# } | ||
# We need to cache all the node_module dirs, or we'll always be installing them on each run | ||
local package_locations=($(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location | @sh'| tr -d \')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to rely on the yarn workspaces info --json
to give us the detailed list of locations where packages are located. That way we can cache all the relevant node_modules
without needing to guess and run random searches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I wonder if it would be useful to implement a similar check in the new site flow and/or framework-info library. I'm not sure if that's feasible, but it could be helpful for monorepo site setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to write that it should be relatively simple to achieve it since we could rely directly in the package.json
, but completely forgot that the workspaces
entry will live in the root package.json
which I guess won't be accessible out of the box for the framework-info
... 😐
We can maybe open an issue in the repo and discuss it over there? CC @verythorough
run-build-functions.sh
Outdated
# (...) | ||
# } | ||
# We need to cache all the node_module dirs, or we'll always be installing them on each run | ||
mapfile -t package_locations <<< "$(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YARN_IGNORE_PATH=1 yarn workspaces info
is called in the test above.
Would it make sense to make the test use --json
and keep the output, to avoid calling yarn workspaces info
twice? The test would then check exit code using $?
. As a tiny performance optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed via 073a3db 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, actually fixed via 8ab88ae 😅
Btw, I'm keeping the exit code in a local variable as a way to avoid hitting - https://github.com/koalaman/shellcheck/wiki/SC2181 - let me know what you think @ehmicky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
run-build-functions.sh
Outdated
# (...) | ||
# } | ||
# We need to cache all the node_module dirs, or we'll always be installing them on each run | ||
mapfile -t package_locations <<< "$(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it work to directly set NETLIFY_JS_WORKSPACE_LOCATIONS
instead of package_locations
? This might require declaring NETLIFY_JS_WORKSPACE_LOCATIONS
as a file-level array variable. This would remove the need for the intermediary variable package_locations
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the cache logic and the respective usage of NETLIFY_JS_WORKSPACE_LOCATIONS
is kind of isolated from the run_yarn
function, I would say there's some use in keeping it that way? 🤔 That way we're keeping all the convoluted logic required to cache workspaces
in the respective restore_js_workspaces_cache
, cache_js_workspaces
and cache_node_modules
. The only thing the run_yarn
is doing right is to pass the locations
to the underlying functions ensuring it's decoupled from that. My concern is that if we end up spreading the NETLIFY_JS_WORKSPACE_LOCATIONS
variable around we might end up with a tighter spaghetti bowl 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense 👍
# YARN_IGNORE_PATH will ignore the presence of a local yarn executable (i.e. yarn 2) and default | ||
# to using the global one (which, for now, is alwasy yarn 1.x). See https://yarnpkg.com/configuration/yarnrc#ignorePath | ||
workspace_output="$(YARN_IGNORE_PATH=1 yarn workspaces --json info )" | ||
workspace_exit_code=$? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[sand!]
workspace_exit_code=$? | |
local workspace_exit_code=$? |
(And removing the declaration above)
(Same for workspace_output
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehmicky happy to do so for the workspace_exit_code
however, for the previous one, declaring a local
variable will actually override the exit code from the command we need to use afterwards - https://github.com/koalaman/shellcheck/wiki/SC2155
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL.
Good catch 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I owe it all to shellcheck - https://github.com/koalaman/shellcheck/ - I would be doomed writing bash/sh without it 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Co-authored-by: ehmicky <ehmicky@users.noreply.github.com>
Based on the discussion started in https://github.com/netlify/pod-workflow/issues/139 (which is in turn based on a lot of the open PRs and issues in this repo), this PR adds support for js workspaces.
Despite being a feature originally supported by
yarn
only,npm v7
is catching up, with on-going discussions and RFCs (see npm/rfcs#117 and npm/rfcs#273) to add further functionality. These however still work differently,npm
is unable to detect it is within a project with multiple workspaces if wecd
into a sub-directory. Fornetlify
(if we use thebase
configuration property) our caching strategy still holds up, since we just install and cache a workspace package as it was a regular js project. Foryarn
however, installs in sub directories are able to infer they're part of aworkspace
, meaning dependencies will be hoisted and all the sibling packages will be installed also. This means that, in order to achieve the same caching experience we have for other JS projects, we need to cache the rootnode_modules
as well as all the workspacenode_modules
.This solution works for both
yarn v1
andyarn v2
(that rely onnode_modules
nodeLinker
config).I've put in place an env variable -
NETLIFY_YARN_WORKSPACES
- that we can rely on to use as feature flag by hooking this up in thebuildbot
side with launch darkly.Addresses #399 #432 #196 #479 #196
Tests
I've run this locally on a monorepo using:
An example output of a cached execution with
yarn
: