Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement NPM caching to fix concurrency #2151

Merged

Conversation

Xcelled
Copy link
Contributor

@Xcelled Xcelled commented May 31, 2024

The existing implementation of NPM module caching used marker files and had some flaws that we (at HubSpot) found at scale, notably:

  • Entries were available in the cache before they were finished being written into the cache (ie the cache retrieval code assumes storing a cache entry is atomic, but it's not)
  • In the case of contention, each process would wait for the marker file to be deleted once and then assume it had exclusive access without verifying it now actually held the lock; this led to race conditions and exceptions like NoSuchFileException when using [npmInstallCache()] and running clean build [gradlew clean build] #1984
  • Cache failures cause the build to fail instead of being a warning

I changed the implementation to instead first copy into a temporary sibling folder and then atomically rename the temp folder to the expected cache key, ignoring issues if a different thread/process manages to create the cache entry first. This eliminates the race conditions above. If the system does not support atomic moves, we log a warning that caching could not be completed safely.

As part of this, the behavior of shadowcopy changed to not update the cache once an entry is cached; this seems to me to be the safest and most desirable but it went against a test I removed, so I'm open to hearing otherwise.

This code has been running on all java builds at HubSpot for about a week. During that time, we ran 2,540,327 builds in our CI system and 15,901 builds on local developer machines and have not seen issues.

@nedtwigg nedtwigg changed the base branch from main to hubspot-cantpush-fixup June 4, 2024 07:03
@nedtwigg
Copy link
Member

nedtwigg commented Jun 4, 2024

Thanks for the fix! The failure is just spotbugs, but I can't push to your branch. I'm working around by merging to hubspot-cantpush-fixup, and then I'll fix it there.

@nedtwigg nedtwigg merged commit 6858c18 into diffplug:hubspot-cantpush-fixup Jun 4, 2024
10 of 14 checks passed
@nedtwigg
Copy link
Member

nedtwigg commented Jun 4, 2024

You can see the little tweak I made here: abf887d

@Xcelled
Copy link
Contributor Author

Xcelled commented Jun 5, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants