-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provider cache becomes ineffective with 1.4.0-alpha release #32205
Comments
Just want to extremely strongly 👍 this, we also have a ton of parallelization in our automation and from a purely timing perspective, we are looking at 30-60+ seconds added to every single This change will have a HUGE performance hit for us, which is an extremely painful process when we are already struggling with the performance of Terraform and have a number of open issues to improve Terraform's performance like #29503 and #31964. It seems extremely disproportionate to cause this much of a performance hit out of a desire to solve an issue that someone ran into for the first time more than 2 years after the relevant code/behavior was released in 0.14. |
Hi @tjstansell. Thanks for reporting this. The output here suggests that you are working in a repository without a dependency lock file, because I see it announcing that it is querying for the latest version instead of selecting a version from the lock file. If you run |
Hey @apparentlymart, we deliberately do not use a dependency lock file since we have over 5k root modules so having to update 5k+ dependency lock files on a regular basis is either a TON of toil, or we would have to build a new set of automation tooling and allow said tooling to push directly to master, which is not something that our security and compliance teams are ok with given our required repo branch protection rules. The only scenario in which a dependency lock file would become useful to us is if we could set it at a top level directory or a global spot on the machine and have it get inherited down to all of the child directories, at least that way we would be talking about having to keep 1-30 spots updated instead 5K+. We have previously discussed this in other issues and with Omar/Korinne, and were given assurances that we could safely just ignore the lockfile if we found it to be an impediment. We are also not the only people that do this, as evidenced by #31856 asking to disable the warning when Terraform generates the lockfile. |
Can't the cache include some form of metadata so you can detect the proper hash values (from the original zips it was installed from) so the cache could still be used during an initial run as well? Even if it's just a As with @joe-a-t, we deliberately do not check in lock files for our repos. Maintaining those across all of our repos would be a full time job. |
This change effectively makes terragrunt expensive to use, possibly prohibitively so. We only have about ~380 terragrunt stacks (not 5k+! 😨 ) and doing $ time TERRAGRUNT_TFPATH=/Users/joe/git/hashicorp/terraform/terraform terragrunt run-all init --terragrunt-parallelism=8 --terragrunt-ignore-dependency-order --terragrunt-ignore-dependency-errors -lock=false
4580.15s user / 1923.02s system / 329% cpu 32:52.57 total / 0 amem / 145684 maxmem $ du -shx .
196G . $ git ls-tree -r --name-only head | grep terragrunt.hcl | wc -l
382 |
WAD? Seriously? I guess so, since the whole dependency lockfile design is definitely not a "one size fits all" solution and makes terraform decidedly more difficult to use for many of us. Please add a way to just disable that feature as has been asked for in issues like #29760 and #31533. Please stop forcing us to endure spurious warnings and workarounds. |
This is very disappointing. I'm glad there's at least some kind of temporary workaround available, but all the comments in the docs about silently discarding that flag down the road means that this is really just kicking the can down the road. I know it's been said that Terraform "needs" the lockfile, but since it can initialize without one, it clearly doesn't really. Clearly there are plenty of people who, for various reasons, find checking in the lockfile to VCS to be a neutral or a negative; I really still haven't seen a good explanation why it's impossible to come up with a solution that more gracefully accommodates these ways of working. Correct me if I'm wrong, but also haven't really seen any of the accommodations / improvements mentioned in #31533 and similar items happen yet. |
We also do not check in the lockfile and do not plan to since we find locking providers in a Please do not force the requirement of checking in the lockfile. |
This has already been addressed in #32494. Those who use an non-standard workflow without checking in the lock file can opt in to the previous broken behavior using a CLI configuration setting, which will then behave just the same as 1.3. There will be more information on this in the v1.4 upgrade guide once v1.4.0 final is released.
This new behavior is better for those who use Terraform as recommended, by checking their lock file into version control, and so the opt-out is for the minority of folks who use a non-standard workflow (including those who spoke up here). If you don't wish to adopt the recommended workflow then you should set the new CLI configuration setting once you upgrade. |
Thanks @apparentlymart. Is there a proposed discussion or issue that we can follow that is meant to work through the reasons why we currently do not check in a lock file and therefore do not follow your recommended workflow? If you intend to eventually remove this workaround, I'd like to be involved in the discussions to work through the requirements of that. |
@tjstansell some discussions that may be of interest to you: |
Thanks @wyardley. Yeah, I've been looking at those, but was kinda hoping that Hashicorp was trying to have a more consolidated effort somewhere, rather than spread across various issues... |
Just my 2 cents. I would strongly dismiss all comments mentioning that they face problems because they don't use lock files as invalid use cases. I am sorry, in the day and age supply chain attacks became new normal and SBOM initiatives finally gaining traction left and right - there can't be an excuse to even mention this as something someone would seriously consider doing on a real production system. It is bizarre to me that there are still people who stuck in the past so much and are still adding lock files to Regardless, the "fix" in 1.4 is still bizarre to me too. If I have cache locally, why Terraform would even need to reach out remotely for TOFU? What is wrong with trusting the local cache? I am not aware of any other dependency management system that would do something like this, why Hashicorp is failing to copy well established best practices? This is a rhetoric question - I am trying to highlight the fact that the "lock" implementation is messed up in its entirety, not just the lock gen. There shouldn't be a separation of lock metadata per arch to begin with. A lock file is a part of configuration, and in current implementation - it leaks system implementation details into the user-space. Current implementation is missing an abstraction layer. Terraform could, for instance, cache the list of checksums per arch along with the arch-specific binary, and use a checksum of checksums as a lock file to store in git. It could also use cryptographic signatures (which it already does, just not for this purpose) and store fingerprint in the lock file. Even a dumb approach of caching all architectures in the local cache would be more efficient and less wasteful that the current implementation. There are multiple ways to achieve this and remove the need to reach out externally for the lock gen, which the actual root cause of all of these associated problems. |
@dee-kryvenko don't want to start a big sidetrack on here, but there are multiple ways to prevent supply chain attacks, a lockfile is not the only way and in fact interferes with some other methods that are arguably superior like having local caches that are pre-loaded with the blessed versions of providers instead of pulling them from the internet. Not using a lockfile does not in and of itself mean that you are vulnerable to the attacks or downsides that your post described. |
OMG... somehow this conversations starts to remind me MPLS vs SD-WAN conversation back in the day. And how do you rebuild that cache?... Is that what you do with everything else, like Go, pip, gems, npm etc?... And what guarantees the integrity of that local cache? |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
Terraform Configuration Files
Enable the global provider cache by setting
TF_PLUGIN_CACHE_DIR
.Debug Output
Not really applicable.
Expected Behavior
The global provider cache should be used when initializing a terraform repository. We use terragrunt to initialize and run hundreds of terraform repositories simultaneously and rely on the global cache to optimize this process. We ensure that the provider cache is fully populated first because of the lack of concurrency support, but once it has been populated, we rely on terraform being able to utilize this to initialize by simply reading from the cache.
This is what 1.3.x shows and is what is expected:
Actual Behavior
Terraform tries to re-populate the provider in the global cache, with the exact same file. Because this process is not concurrency-safe, this causes problems when we use terragrunt and we have multiple terraform initializations running simultaneously. It leads to checksum errors and failures.
With 1.4.0-alpha, initialization of every repository shows:
Steps to Reproduce
Initialize two separate terraform repositories with the global provider cache enabled. Both will install the same providers into the cache rather than re-using them.
Additional Context
No response
References
The text was updated successfully, but these errors were encountered: