Reimplement NPM caching to fix concurrency #2151
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The existing implementation of NPM module caching used marker files and had some flaws that we (at HubSpot) found at scale, notably:
I changed the implementation to instead first copy into a temporary sibling folder and then atomically rename the temp folder to the expected cache key, ignoring issues if a different thread/process manages to create the cache entry first. This eliminates the race conditions above. If the system does not support atomic moves, we log a warning that caching could not be completed safely.
As part of this, the behavior of shadowcopy changed to not update the cache once an entry is cached; this seems to me to be the safest and most desirable but it went against a test I removed, so I'm open to hearing otherwise.
This code has been running on all java builds at HubSpot for about a week. During that time, we ran 2,540,327 builds in our CI system and 15,901 builds on local developer machines and have not seen issues.