Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform lock file committed on arm, linux amd deploy, init command throws error with tf 0.14 #1408

Open
ghostsquad opened this issue Feb 12, 2021 · 13 comments
Labels
docs Documentation Stale

Comments

@ghostsquad
Copy link

I ran into the following issue:

running "/atlantis/data/bin/terraform0.14.6 init -input=false -no-color -upgrade" in "/atlantis/data/repos/tunein/atlantis/16/default/deploy/environments/production": exit status 1

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Finding hashicorp/aws versions matching "3.28.0"...
- Using hashicorp/aws v3.28.0 from the shared cache directory

Error: Failed to install provider from shared cache

Error while importing hashicorp/aws v3.28.0 from the shared cache directory:
the provider cache at .terraform/providers has a copy of
registry.terraform.io/hashicorp/aws 3.28.0 that doesn't match any of the
checksums recorded in the dependency lock file.

and after looking up the error (https://www.terraform.io/docs/cli/commands/providers/lock.html)

I think that -upgrade is the problem here, but I can't be sure.

@ghostsquad
Copy link
Author

ghostsquad commented Feb 12, 2021

changing the workflow to look like this:

+      "workflows": {
+        "default": {
+          "apply": {
+            "steps": [
+              "apply"
+            ]
+          },
+          "plan": {
+            "steps": [
+              {
+                "run": "terraform init -input=false -no-color"
+              },
+              "plan"
+            ]
+          }
+        },

I now get this:

exit status 1: running "terraform init -input=false -no-color" in "/atlantis/data/repos/tunein/atlantis/16/default/deploy/environments/production": 

Error: Unsupported Terraform Core version

  on main.tf line 15, in terraform:
  15:   required_version = "0.14.6"

This configuration does not support Terraform version 0.13.0. To proceed,
either choose another supported Terraform version or update this version
constraint. Version constraints are normally set for good reason, so updating
the constraint may lead to other errors or unexpected behavior.

despite having my .atlantis.yaml set as:

projects:
  - name: production
    dir: ./deploy/environments/production
    terraform_version: 0.14.6

and main.tf with:

terraform {
  ...

  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "3.28.0"
    }
  }

  required_version = "0.14.6"
}

@ghostsquad
Copy link
Author

I was able to fix the version issue by changing the workflow to terraform${ATLANTIS_TERRAFORM_VERSION} init -input=false -no-color. I think this needs to be better called out in the documentation, right now, it makes it seem that simply using terraform in a custom workflow will do the right thing, but it won't.

@nishkrishnan
Copy link
Contributor

if you use extra-args it'll use the version you specify in your atlantis.yaml.

referencing the binary directly in a custom run command doesn't work if you're using the non-default. We can make this clearer in our docs.

@nishkrishnan nishkrishnan added the docs Documentation label Feb 25, 2021
@bryankaraffa
Copy link

bryankaraffa commented Mar 17, 2021

+1 on this issue as we encountered it with Terraform v0.14. Seems like the two workaround are:

  • Do not commit .terraform.lock.hcl file to repo. This will cause atlantis to always pull down the latest version of providers during init
  • Custom workflow that provides extra_args: ["-upgrade", "false"] to the init step. This will cause atlantis to respect the .terraform.lock.hcl file if it exists.

I am going with custom workflow method for now..

Edit: This comment #1408 (comment) identified the issue/fix

@ghostsquad
Copy link
Author

ghostsquad commented Mar 17, 2021

if you use extra-args it'll use the version you specify in your atlantis.yaml.

referencing the binary directly in a custom run command doesn't work if you're using the non-default. We can make this clearer in our docs.

Are extra args deduplicated? Such that if I specify an argument that is already a default (but with a different value), are they both passed to terraform? Or does last arg win?

@davidmontoyago
Copy link

Are extra args deduplicated? Such that if I specify an argument that is already a default (but with a different value), are they both passed to terraform? Or does last arg win?

They don't seem to get dedup. Adding extra_args: ["-upgrade", "false"] duplicates the -upgrade flag.

"/atlantis/bin/terraform0.14.7 init -input=false -no-color -upgrade -upgrade=false"

@davidmontoyago
Copy link

To follow up on this one... with fix #1651 the -upgrade flag is deduped, however, atlantis will still fail with the error below (that is, when the .terraform.lock.hcl is committed):

Error: Failed to install provider from shared cache

@Pluies
Copy link

Pluies commented Jan 19, 2022

For what it's worth, I ran into the same issue, and it appears the root cause of the issue is that the terraform lock file was generated in OS X but Atlantis was running in linux_amd64?

Running the following line added extra checksums for the linux_amd64 version of the providers:

terraform providers lock -platform=linux_amd64

After committing and pushing this change to the lockfile, Atlantis is happy to use the cached version of the provider and runs without issues.

(I discovered this thanks to https://zenn.dev/shonansurvivors/scraps/7dd3ab1188c956 – I assume this is the same issue based on error messages and the step to fix it, even though I don't read Japanese 😄 )

@tomharrisonjr
Copy link
Contributor

tomharrisonjr commented May 6, 2022

For what it's worth, I ran into the same issue, and it appears the root cause of the issue is that the terraform lock file was generated in OS X but Atlantis was running in linux_amd64?

Running the following line added extra checksums for the linux_amd64 version of the providers:

terraform providers lock -platform=linux_amd64

After committing and pushing this change to the lockfile, Atlantis is happy to use the cached version of the provider and runs without issues.

(I discovered this thanks to https://zenn.dev/shonansurvivors/scraps/7dd3ab1188c956 – I assume this is the same issue based on error messages and the step to fix it, even though I don't read Japanese 😄 )

Thanks @Pluies -- that was our issue. And it was the sole reason we were using custom workflows for all of our root
modules ... and custom workflows don't work with the new streaming output in the Atlantis UI. So now, we can have our 🍰 and 😮‍💨 it too 😄

It's possible to generate the checksums for multiple architectures in a single go, such that lockfiles will work with old and new macs, intel and amd (Graviton) instances. I added a script terraform_lockfile.sh to our repo like this:

#!/usr/bin/env bash
#
# Generates .terraform.lock.hcl file having hashes for each architecture we run on
# https://www.terraform.io/cli/commands/providers/lock

terraform providers lock -platform=darwin_arm64 -platform=darwin_amd64 -platform=linux_amd64 -platform=linux_arm64

@nitrocode nitrocode changed the title default init command includes -upgrade which is not desirable starting with terraform 0.14 terraform lock file committed from arm, linux amd deploy, init command includes -upgrade which throws an error after tf 0.14 Jan 16, 2023
@nitrocode
Copy link
Member

nitrocode commented Jan 16, 2023

Sounds like the workaround is to either

  • do not commit the lock file
  • if it is committed, lock it for the platform that Atlantis is deployed to and the platforms terraform workflows are run on locally (e.g. local m1 laptops)

Thanks for everyone investigating this and coming up with a solution that works.

It would be nice to create a new doc to mention how to commit this file properly.

@nitrocode nitrocode reopened this Jan 16, 2023
@nitrocode nitrocode changed the title terraform lock file committed from arm, linux amd deploy, init command includes -upgrade which throws an error after tf 0.14 terraform lock file committed on arm, linux amd deploy, init command throws error with tf 0.14 Jan 16, 2023
@cilindrox
Copy link
Contributor

Just chiming in - we're not vendoring/committing the lockfiles and we're still running into this.

Workaround is to delete the plugin cache dir or vendor/commit the lockfile with the platform atlantis is running on (+ any local envs etc)

@vincentgna
Copy link
Contributor

vincentgna commented Sep 19, 2023

is there a regression on this workaround for v0.25.0?

Ref:

I tried upgrading (listing all changes to highlight the issue seems related to v0.25.0):

Atlantis. Terraform TF provider AWS
from v0.24.4 v1.5.4 ~> v4
to v0.25.0 v1.5.7 ~> v5
revert v0.24.4 v1.5.7 ~> v5

I am using this in my atlantis.env snippet

# Atlantis issues with TF 1.4+
# https://github.com/runatlantis/atlantis/issues/3201
TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE=true
# ...

Note: I run atlantis in a systemd unit on an EC2 instance, no container / no k8s configmaps or secrets and everything works in v0.24.4

I do consider finding out a way to make sure the terraform lock files are committed (we run across windows/linux/mac and amd64/arm64 machines so we're not comiting lock files yet, but if anyone has some type of pre-commit checks that help validate the lock file, I'll make sure the lock files are added to resolve this issue instead.

the only change log entries mentioning lock files for v0.25.0 release seems to be:

@vincentgna
Copy link
Contributor

I was storing the plugin-cache on an EBS volume and while doing provider upgrades, there would be issues with the versions in there.

So perhaps there's no regression and I just had to rm -rf the plugin-cache and force a new copy running terraform init

@dosubot dosubot bot added the Stale label Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation Stale
Projects
None yet
Development

No branches or pull requests

9 participants