-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
providercache: Ignore lock-mismatching global cache entries #32129
Conversation
I can see that this has broken an assumption being made by one of the end-to-end tests that didn't run in my dev environment because I was only running the "offline" subset of the tests. I'll see about updating that tomorrow so that it properly models the two cases of this new behavior, both with and without an existing lock file. |
When we originally introduced the trust-on-first-use checksum locking mechanism in v0.14, we had to make some tricky decisions about how it should interact with the pre-existing optional read-through global cache of provider packages: The global cache essentially conflicts with the checksum locking because if the needed provider is already in the cache then Terraform skips installing the provider from upstream and therefore misses the opportunity to capture the signed checksums published by the provider developer. We can't use the signed checksums to verify a cache entry because the origin registry protocol is still using the legacy ziphash scheme and that is only usable for the original zipped provider packages and not for the unpacked-layout cache directory. Therefore we decided to prioritize the existing cache directory behavior at the expense of the lock file behavior, making Terraform produce an incomplete lock file in that case. Now that we've had some real-world experience with the lock file mechanism, we can see that the chosen compromise was not ideal because it causes "terraform init" to behave significantly differently in its lock file update behavior depending on whether or not a particular provider is already cached. By robbing Terraform of its opportunity to fetch the official checksums, Terraform must generate a lock file that is inherently non-portable, which is problematic for any team which works with the same Terraform configuration on multiple different platforms. This change addresses that problem by essentially flipping the decision so that we'll prioritize the lock file behavior over the provider cache behavior. Now a global cache entry is eligible for use if and only if the lock file already contains a checksum that matches the cache entry. This means that the first time a particular configuration sees a new provider it will always be fetched from the configured installation source (typically the origin registry) and record the checksums from that source. On subsequent installs of the same provider version already locked, Terraform will then consider the cache entry to be eligible and skip re-downloading the same package. This intentionally makes the global cache mechanism subordinate to the lock file mechanism: the lock file must be populated in order for the global cache to be effective. For those who have many separate configurations which all refer to the same provider version, they will need to re-download the provider once for each configuration in order to gather the information needed to populate the lock file, whereas before they would have only downloaded it for the _first_ configuration using that provider. This should therefore remove the most significant cause of folks ending up with incomplete lock files that don't work for colleagues using other platforms, and the expense of bypassing the cache for the first use of each new package with each new configuration. This tradeoff seems reasonable because otherwise such users would inevitably need to run "terraform providers lock" separately anyway, and that command _always_ bypasses the cache. Although this change does decrease the hit rate of the cache, if we subtract the never-cached downloads caused by "terraform providers lock" then this is a net benefit overall, and does the right thing by default without the need to run a separate command.
1fca13b
to
581f0b9
Compare
Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch. |
I am extremely concerned about this change. It sounds like you are saying that this will result in the global provider cache being entirely ignored if people don't use checksum lockfiles in all of their root modules. We deliberately do not use those lockfiles so it sounds like we will take a huge performance hit as a result of this change since the cache would always be ignored for us. Is that understanding correct? |
Also, there are some custom providers we use that do not have an upstream to pull from, we have to pre-populate them in the cache. If I am reading this correctly, all of those will break if this change gets merged. I strongly believe that this would wreck so much chaos on our workflow (which I don't think is unique) that it would be entirely inappropriate to do without a major version release. |
Yes, it is correct that with this change using lock files will be mandatory if you want to use the global plugin cache. We have merged this early in the v1.4 development period to allow plenty of time to refine it based on feedback. This should not break local (non-registry) modules if you use them in the documented way, by placing them into a local filesystem mirror directory instead of in the global cache. Terraform itself is the only process allowed to write into the global cache directory. The alpha release today is one vehicle for that feedback; I'd love if you'd try the alpha and share your experiences in a new issue. Thanks! |
So, this completely breaks our use of the cache in a terragrunt world where we're initializing hundreds of terraform repositories. We pre-populate the provider cache once and then in the terraform 1.3 world, every repo we initializes shows it's
This effectively negates the use of the cache for us. |
I'll add that even though all of this is running inside a custom docker image with a local mirror directory the providers come from (note the |
Sorry, I'll open a new ticket with this feedback. |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
When we originally introduced the trust-on-first-use checksum locking mechanism in v0.14, we had to make some tricky decisions about how it should interact with the pre-existing optional read-through global cache of provider packages:
The global cache conflicts with the checksum locking because if the needed provider is already in the cache then Terraform skips installing the provider from upstream and therefore misses the opportunity to capture the signed checksums published by the provider developer. We can't use the signed checksums to verify a cache entry because the origin registry protocol is still using the legacy ziphash scheme and that is only usable for the original zipped provider packages and not for the unpacked-layout cache directory. Therefore we decided to prioritize the existing cache directory behavior at the expense of the lock file behavior, making Terraform produce an incomplete lock file in that case.
Now that we've had some real-world experience with the lock file mechanism, we can see that the chosen compromise was not ideal because it causes
terraform init
to behave significantly differently in its lock file update behavior depending on whether or not a particular provider is already cached. By robbing Terraform of its opportunity to fetch the official checksums, Terraform must generate a lock file that is inherently non-portable, which is problematic for any team which works with the same Terraform configuration on multiple different platforms.This change addresses that problem by essentially flipping the decision so that we'll prioritize the lock file behavior over the provider cache behavior. Now a global cache entry is eligible for use if and only if the lock file already contains a checksum that matches the cache entry. This means that the first time a particular configuration sees a new provider it will always be fetched from the configured installation source (typically the origin registry) and record the checksums from that source.
On subsequent installs of the same provider version already locked, Terraform will then consider the cache entry to be eligible and skip re-downloading the same package.
This intentionally makes the global cache mechanism subordinate to the lock file mechanism: the lock file must be populated in order for the global cache to be effective. For those who have many separate configurations which all refer to the same provider version, they will need to re-download the provider once for each configuration in order to gather the information needed to populate the lock file, whereas before they would have only downloaded it for the first configuration using that provider.
This should therefore remove the most significant cause of folks ending up with incomplete lock files that don't work for colleagues using other platforms, at the expense of bypassing the cache for the first use of each new package with each new configuration. This tradeoff seems reasonable because otherwise such users would inevitably need to run
terraform providers lock
separately anyway, and that command always bypasses the cache. Although this change does decrease the hit rate of the cache, if we subtract the never-cached downloads caused by "terraform providers lock" then this is a net benefit overall, and does the right thing by default without the need to run a separate command.