Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

feat: better cache support for JS workspaces #526

Merged
merged 10 commits into from
Mar 8, 2021
100 changes: 94 additions & 6 deletions run-build-functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ mkdir -p $NETLIFY_CACHE_DIR/swift_version

# pwd caches
mkdir -p $NETLIFY_CACHE_DIR/node_modules
mkdir -p $NETLIFY_CACHE_DIR/js-workspaces
mkdir -p $NETLIFY_CACHE_DIR/.bundle
mkdir -p $NETLIFY_CACHE_DIR/bower_components
mkdir -p $NETLIFY_CACHE_DIR/.venv
Expand Down Expand Up @@ -102,6 +103,34 @@ run_yarn() {
fi


if [ "$NETLIFY_YARN_WORKSPACES" = "true" ]
then
echo "NETLIFY_YARN_WORKSPACES feature flag set"
# YARN_IGNORE_PATH will ignore the presence of a local yarn executable (i.e. yarn 2) and default
# to using the global one (which, for now, is alwasy yarn 1.x). See https://yarnpkg.com/configuration/yarnrc#ignorePath
JGAntunes marked this conversation as resolved.
Show resolved Hide resolved
if YARN_IGNORE_PATH=1 yarn workspaces info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[sand] Can we clarify a bit what YARN_IGNORE_PATH does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to had a better description. Let me know what you think @erezrokah 👍 The naming for the variable vs its final purpose can be a bit confusing 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGAntunes Are these env vars intended to be user-editable in the long term? If so, we'll need to document them. (cc @rstavchansky)

Is there anything else that should be documented? Or is this automatic behavior that's considered more "under the hood"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JGAntunes - so we are forcing yarn@1 just for the workspaces command?
And this works since the configuration hasn't changed between v1 and v2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these env vars intended to be user-editable in the long term?
Is there anything else that should be documented? Or is this automatic behavior that's considered more "under the hood"?

The only new user-editable env var we're introducing is NETLIFY_YARN_WORKSPACES, and it shouldn't be editable in the long term, hence no need to document it IMO CC @rstavchansky. Theoretically speaking, users could set an environment variable NETLIFY_YARN_WORKSPACES=true that would trigger this behaviour (instead of relying on a feature flag on our side to do it - which is our plan to roll this out) but I don't really see that as major concern. The plan is for this behaviour to eventually become default without the need of any env vars (we infer the presence of a workspace monorepo on our end).

That being said, later on (maybe once this has been rolled out accordingly), I think it would be a good idea to produce some examples of monorepo usages with Netlify as it is not a trivial setup to do and, to my understanding, we don't have a write up on this (correct me if I'm wrong though). I think there's a lot of workarounds that originated in the issues above that are no longer applicable, so documenting that could probably be beneficial. Happy to move this discussion back to - https://github.com/netlify/pod-workflow/issues/139 - if you prefer 👍 CC @rstavchansky @verythorough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JGAntunes - so we are forcing yarn@1 just for the workspaces command?
And this works since the configuration hasn't changed between v1 and v2?

Exactly @erezrokah! The configuration is still extracted from the workspaces entry in the package.json and that hasn't changed (from what I managed to look into), that way we are able to validate the presence of workspaces without the need of a separate branch for the yarn berry case.

Is that clear enough from the comment and usage? I'm happy to provide further information in there if you think like it makes sense 👍

then
local package_locations
# Extract all the packages and respective locations. .data will be a JSON object like
# {
# "my-package-1": {
# "location": "packages/blog-1",
# "workspaceDependencies": [],
# "mismatchedWorkspaceDependencies": []
# },
# (...)
# }
# We need to cache all the node_module dirs, or we'll always be installing them on each run
mapfile -t package_locations <<< "$(YARN_IGNORE_PATH=1 yarn --json workspaces info | jq -r '.data | fromjson | to_entries | .[].value.location')"
Copy link
Contributor

@ehmicky ehmicky Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YARN_IGNORE_PATH=1 yarn workspaces info is called in the test above.
Would it make sense to make the test use --json and keep the output, to avoid calling yarn workspaces info twice? The test would then check exit code using $?. As a tiny performance optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed via 073a3db 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, actually fixed via 8ab88ae 😅

Btw, I'm keeping the exit code in a local variable as a way to avoid hitting - https://github.com/koalaman/shellcheck/wiki/SC2181 - let me know what you think @ehmicky

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Copy link
Contributor

@ehmicky ehmicky Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to directly set NETLIFY_JS_WORKSPACE_LOCATIONS instead of package_locations? This might require declaring NETLIFY_JS_WORKSPACE_LOCATIONS as a file-level array variable. This would remove the need for the intermediary variable package_locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the cache logic and the respective usage of NETLIFY_JS_WORKSPACE_LOCATIONS is kind of isolated from the run_yarn function, I would say there's some use in keeping it that way? 🤔 That way we're keeping all the convoluted logic required to cache workspaces in the respective restore_js_workspaces_cache, cache_js_workspaces and cache_node_modules. The only thing the run_yarn is doing right is to pass the locations to the underlying functions ensuring it's decoupled from that. My concern is that if we end up spreading the NETLIFY_JS_WORKSPACE_LOCATIONS variable around we might end up with a tighter spaghetti bowl 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍

restore_js_workspaces_cache "${package_locations[@]}"
else
echo "No workspace detected"
restore_cwd_cache node_modules "node modules"
fi
else
restore_cwd_cache node_modules "node modules"
fi

echo "Installing NPM modules using Yarn version $(yarn --version)"
run_npm_set_temp

Expand Down Expand Up @@ -130,6 +159,8 @@ run_npm_set_temp() {
}

run_npm() {
restore_cwd_cache node_modules "node modules"

if [ -n "$NPM_VERSION" ]
then
if [ "$(npm --version)" != "$NPM_VERSION" ]
Expand Down Expand Up @@ -475,7 +506,6 @@ install_dependencies() {

if [ -f package.json ]
then
restore_cwd_cache node_modules "node modules"
if [ "$NETLIFY_USE_YARN" = "true" ] || ([ "$NETLIFY_USE_YARN" != "false" ] && [ -f yarn.lock ])
then
run_yarn $YARN_VERSION
Expand Down Expand Up @@ -669,9 +699,17 @@ install_dependencies() {
#
cache_artifacts() {
echo "Caching artifacts"

cache_cwd_directory ".bundle" "ruby gems"
cache_cwd_directory "bower_components" "bower components"
cache_cwd_directory "node_modules" "node modules"

if [ "$NETLIFY_YARN_WORKSPACES" == "true" ]
then
cache_node_modules
else
cache_cwd_directory "node_modules" "node modules"
fi

Comment on lines +712 to +718
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to, once we can drop the feature flag, we keep only the cache_node_modules function containing all the logic relative to node_modules caching.

cache_cwd_directory ".venv" "python virtualenv"
cache_cwd_directory ".build" "swift build"
cache_cwd_directory ".netlify/plugins" "build plugins"
Expand All @@ -690,7 +728,7 @@ cache_artifacts() {
cache_home_directory ".composer" "composer dependencies"
cache_home_directory ".homebrew-cache", "homebrew cache"
cache_home_directory ".rustup" "rust rustup cache"

if [ -f Cargo.toml ] || [ -f Cargo.lock ]
then
cache_home_directory ".cargo/registry" "rust cargo registry cache"
Expand Down Expand Up @@ -747,11 +785,11 @@ cache_artifacts() {
move_cache() {
local src=$1
local dst=$2
if [ -d $src ]
if [ -d "$src" ]
then
echo "Started $3"
rm -rf $dst
mv $src $dst
rm -rf "$dst"
mv "$src" "$dst"
echo "Finished $3"
fi
}
Expand Down Expand Up @@ -779,6 +817,56 @@ restore_cwd_cache() {
move_cache "$NETLIFY_CACHE_DIR/$1" "$PWD/$1" "restoring cached $2"
}

#
# Restores node_modules dirs cached for js workspaces
# See https://github.com/netlify/pod-workflow/issues/139/ for more context
#
# Expects:
# $@ each argument should be a package location relative to the repo's root
restore_js_workspaces_cache() {
local locations=("$@")
local cache_dir="$NETLIFY_CACHE_DIR/js-workspaces"
ehmicky marked this conversation as resolved.
Show resolved Hide resolved

# Retrieve each workspace node_modules
for location in "${locations[@]}"; do
move_cache "$cache_dir/$location/node_modules" "$NETLIFY_REPO_DIR/$location/node_modules" "restoring workspace $location node modules"
done
# Retrieve hoisted node_modules
move_cache "$cache_dir/node_modules" "$NETLIFY_REPO_DIR/node_modules" "restoring workspace root node modules"
# Keep a record of the workspaces in the project in order to cache them later
NETLIFY_JS_WORKSPACE_LOCATIONS=$(printf '%s\n' "${locations[@]}")
ehmicky marked this conversation as resolved.
Show resolved Hide resolved
}

#
# Caches node_modules dirs for a js project. Either detects the presence of js workspaces
# via the `NETLIFY_JS_WORKSPACE_LOCATIONS` variable, or looks at the node_modules in the cwd.
#
cache_node_modules() {
if [ -z "$NETLIFY_JS_WORKSPACE_LOCATIONS" ]
then
cache_cwd_directory "node_modules" "node modules"
else
cache_js_workspaces
fi
}

#
# Caches node_modules dirs from js workspaces. It acts based on the presence of a
# `NETLIFY_JS_WORKSPACE_LOCATIONS` variable previously set in `restore_js_workspaces_cache()`
#
cache_js_workspaces() {
local cache_dir="$NETLIFY_CACHE_DIR/js-workspaces"
local locations
mapfile -t locations <<< "$NETLIFY_JS_WORKSPACE_LOCATIONS"

for location in "${locations[@]}"; do
mkdir -p "$cache_dir/$location"
move_cache "$NETLIFY_REPO_DIR/$location/node_modules" "$cache_dir/$location/node_modules" "saving workspace $location node modules"
done
# Retrieve hoisted node_modules
move_cache "$NETLIFY_REPO_DIR/node_modules" "$cache_dir/node_modules" "saving workspace root node modules"
}

cache_cwd_directory() {
move_cache "$PWD/$1" "$NETLIFY_CACHE_DIR/$1" "saving $2"
}
Expand Down