Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 bucket rewind error when running apply #34528

Closed
DazKins opened this issue Jan 16, 2024 · 20 comments · Fixed by #34796
Closed

S3 bucket rewind error when running apply #34528

DazKins opened this issue Jan 16, 2024 · 20 comments · Fixed by #34796
Labels
backend/s3 bug new new issue not yet triaged v1.6 Issues (primarily bugs) reported against v1.6 releases

Comments

@DazKins
Copy link

DazKins commented Jan 16, 2024

Terraform Version

Terraform v1.6.5
on darwin_arm64

Terraform Configuration Files

// providers.tf

terraform {
  required_providers {
    ...
  }
  required_version = "~> 1.6.5"

  backend "s3" {
    region  = "eu-west-1"
    bucket  = "..."
    key     = "terraform.tfstate"
    encrypt = true
  }
}

Debug Output

I can't reproduce the issue again so can't get a debug output, but here is the default output logs we saw:

2024-01-16T07:15:31.2911852Z ##[group]Run terraform apply -no-color -auto-approve -input=false
2024-01-16T07:15:31.2912737Z �[36;1mterraform apply -no-color -auto-approve -input=false�[0m
2024-01-16T07:15:31.2956434Z shell: /usr/bin/bash -e {0}
2024-01-16T07:15:31.2957094Z env:
2024-01-16T07:15:31.2958080Z   TF_AWS_ROLE: arn:aws:iam::***:role/***
2024-01-16T07:15:31.2958921Z   AWS_DEFAULT_REGION: eu-west-1
2024-01-16T07:15:31.2959498Z   AWS_REGION: eu-west-1
2024-01-16T07:15:31.2960162Z   AWS_ACCESS_KEY_ID: ***
2024-01-16T07:15:31.2960951Z   AWS_SECRET_ACCESS_KEY: ***
2024-01-16T07:15:31.2970497Z   AWS_SESSION_TOKEN: ***
2024-01-16T07:15:31.2971311Z   TERRAFORM_CLI_PATH: /home/runner/work/_temp/8093aa56-402a-471c-af37-829ad1f1dc96
2024-01-16T07:15:31.2972191Z ##[endgroup]
2024-01-16T07:15:39.7033998Z module.*****: Refreshing state... [id=14821739]
2024-01-16T07:15:39.7035304Z 
2024-01-16T07:15:39.7035795Z Error: Failed to save state
2024-01-16T07:15:39.7036852Z 
2024-01-16T07:15:39.7037415Z Error saving state: failed to upload state: operation error S3: PutObject,
2024-01-16T07:15:39.7038426Z failed to rewind transport stream for retry, request stream is not seekable
2024-01-16T07:15:39.7039030Z 
2024-01-16T07:15:39.7039240Z Error: Failed to persist state to backend
2024-01-16T07:15:39.7039628Z 
2024-01-16T07:15:39.7040057Z The error shown above has prevented Terraform from writing the updated state
2024-01-16T07:15:39.7041088Z to the configured backend. To allow for recovery, the state has been written
2024-01-16T07:15:39.7042033Z to the file "errored.tfstate" in the current working directory.
2024-01-16T07:15:39.7042577Z 
2024-01-16T07:15:39.7043108Z Running "terraform apply" again at this point will create a forked state,
2024-01-16T07:15:39.7043870Z making it harder to recover.
2024-01-16T07:15:39.7044266Z 
2024-01-16T07:15:39.7045109Z To retry writing this state, use the following command:
2024-01-16T07:15:39.7045794Z     terraform state push errored.tfstate

Expected Behavior

Terraform apply should run succesfully. If an error occurs during state sync I would also expect the run to end immediately to prevent any divergence in state.

Actual Behavior

Terraform apply continued running but appeared to have some bad state causing it to create/delete resources incorrectly.

Steps to Reproduce

  1. terraform init -no-color
  2. terraform apply -no-color -auto-approve -input=false

As I mentioned earlier, we can't reliably re-produce though.

Additional Context

This was running inside a GHA using hashicorp/setup-terraform@v3

We were also using AWS OIDC to assume a role automatically in the runner: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services

References

No response

@DazKins DazKins added bug new new issue not yet triaged labels Jan 16, 2024
@crw crw added the backend/s3 label Jan 16, 2024
@simonblake-mp
Copy link

simonblake-mp commented Jan 17, 2024

FWIW, I'm seeing this same issue in v1.6.6 both from laptop and in pipelines - after all the resource changes have happened, terraform fails with

  | │ Error: Failed to save state
  | │
  | │ Error saving state: failed to upload state: operation error S3: PutObject, failed to rewind transport stream for retry, request stream is not seekable

seems to be completely random, always in a tree with multiple workspaces. Nothing fancy in our perms, standard AWS SSO users on laptop and EC2 instance IAM roles in CI.

Having it fail in pipelines is particularly problematic, as our pipelines don't currently save errored.tfstate if it exists - so cleaning up is nasty manual state manipulation.

@ewastempel
Copy link

We see the same issue in v1.6.6. Please see the trace for more insights:

2024-01-16T14:27:51.0906051Z 2024-01-16T14:27:50.918Z [DEBUG] states/remote: state read lineage is: e29388d3-e6cf-9115-e169-5f1d9f976c58; lineage is: e29388d3-e6cf-9115-e169-5f1d9f976c58
2024-01-16T14:27:51.0907551Z 2024-01-16T14:27:50.934Z [INFO]  backend-s3: Uploading remote state: tf_backend.operation=Put tf_backend.req_id=23d0d7ed-07da-d344-e6e7-c8ba9a9e84cd tf_backend.s3.bucket=*** tf_backend.s3.path=***
2024-01-16T14:27:51.0918815Z 2024-01-16T14:27:50.941Z [DEBUG] backend-s3: HTTP Request Sent: aws.region=eu-west-2 aws.s3.bucket=*** aws.s3.key=*** rpc.method=PutObject rpc.service=S3 rpc.system=aws-api tf_aws.sdk=aws-sdk-go-v2 tf_aws.signing_region="" tf_backend.operation=Put tf_backend.req_id=23d0d7ed-07da-d344-e6e7-c8ba9a9e84cd tf_backend.s3.bucket=*** tf_backend.s3.path=*** http.request.header.x_amz_decoded_content_length=129597 http.request.header.authorization="AWS4-HMAC-SHA256 Credential=ASIA************KEUA/20240116/eu-west-2/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;content-encoding;content-length;content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-sdk-checksum-algorithm;x-amz-security-token;x-amz-server-side-encryption;x-amz-trailer, Signature=*****" http.request.header.x_amz_date=20240116T142750Z http.request.header.x_amz_content_sha256=STREAMING-UNSIGNED-PAYLOAD-TRAILER http.request.header.x_amz_trailer=x-amz-checksum-sha256 net.peer.name=***.s3.eu-west-2.amazonaws.com http.request_content_length=129679 http.request.body="[Redacted: 126.6 KB (129,679 bytes), Type: application/json]" http.url=https://***.s3.eu-west-2.amazonaws.com/***?x-id=PutObject http.request.header.x_amz_security_token="*****" http.request.header.x_amz_acl=bucket-owner-full-control http.request.header.amz_sdk_request="attempt=1; max=5" http.request.header.accept_encoding=identity http.request.header.x_amz_server_side_encryption=AES256 http.request.header.content_encoding=aws-chunked http.request.header.x_amz_sdk_checksum_algorithm=SHA256 http.method=PUT http.user_agent="APN/1.0 HashiCorp/1.0 Terraform/1.6.6 (+https://www.terraform.io) aws-sdk-go-v2/1.24.0 os/linux lang/go#1.21.5 md/GOOS#linux md/GOARCH#amd64 api/s3#1.47.5 ft/s3-transfer" http.request.header.amz_sdk_invocation_id=1fae9720-5924-45a9-832c-2bce77cca664 http.request.header.content_type=application/json
2024-01-16T14:27:51.5915142Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: reading initial snapshot from errored.tfstate
2024-01-16T14:27:51.5916619Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay
2024-01-16T14:27:51.5917812Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: read nil snapshot
2024-01-16T14:27:51.5919595Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: Importing snapshot with lineage "e29388d3-e6cf-9115-e169-5f1d9f976c58" serial 251 as the initial state snapshot at errored.tfstate
2024-01-16T14:27:51.5921167Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: preparing to manage state snapshots at errored.tfstate
2024-01-16T14:27:51.5922242Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: no previously-stored snapshot exists
2024-01-16T14:27:51.5923195Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: state file backups are disabled
2024-01-16T14:27:51.5924429Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: forcing lineage "e29388d3-e6cf-9115-e169-5f1d9f976c58" serial 251 for migration/import
2024-01-16T14:27:51.5925663Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: writing snapshot at errored.tfstate
2024-01-16T14:27:51.5940360Z 
2024-01-16T14:27:51.5940616Z Error: Failed to save state
2024-01-16T14:27:51.5940990Z 
2024-01-16T14:27:51.5941454Z Error saving state: failed to upload state: operation error S3: PutObject,
2024-01-16T14:27:51.5942382Z failed to rewind transport stream for retry, request stream is not seekable
2024-01-16T14:27:51.5942793Z 
2024-01-16T14:27:51.5942939Z Error: Failed to persist state to backend
2024-01-16T14:27:51.5943194Z 
2024-01-16T14:27:51.5943479Z The error shown above has prevented Terraform from writing the updated state
2024-01-16T14:27:51.5944159Z to the configured backend. To allow for recovery, the state has been written
2024-01-16T14:27:51.5944977Z to the file "errored.tfstate" in the current working directory.
2024-01-16T14:27:51.5945328Z 
2024-01-16T14:27:51.5945594Z Running "terraform apply" again at this point will create a forked state,
2024-01-16T14:27:51.5946094Z making it harder to recover.
2024-01-16T14:27:51.5946293Z 
2024-01-16T14:27:51.5946481Z To retry writing this state, use the following command:
2024-01-16T14:27:51.5946923Z     terraform state push errored.tfstate
2024-01-16T14:27:51.5947279Z 
2024-01-16T14:27:51.6009041Z data.aws_iam_policy_document.assume_role_policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6010040Z data.aws_iam_roles.github_actions_role: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6010940Z module.member-access[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6011803Z module.member-access-us-east[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6012988Z module.member-access-eu-central[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6013994Z module.member-access[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6015084Z module.member-access-us-east[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6016230Z module.member-access-eu-central[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6017237Z module.member-access[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6018331Z module.member-access-us-east[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6019555Z module.member-access-us-east[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6021455Z module.member-access-eu-central[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6022749Z module.member-access[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6023985Z module.member-access-eu-central[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6025019Z module.member-access[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6025868Z module.member-access-eu-central[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6026729Z module.member-access-us-east[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6027463Z data.aws_iam_session_context.whoami: Read complete after 1s [id=<REDACTED>]
2024-01-16T14:27:51.6028171Z data.aws_organizations_organization.root_account: Read complete after 1s [id=<REDACTED>]
2024-01-16T14:27:51.6029100Z module.ssm-cross-account-access.aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6030158Z module.instance-scheduler-access[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6031179Z module.member-access-us-east[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6032185Z module.member-access-eu-central[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6033154Z module.member-access[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]

NOTE, s3 bucket and tf backend values were further redacted with ***.

@DorisEckel8412
Copy link

DorisEckel8412 commented Jan 18, 2024

Hi,
saw also this error since 1.6.5 in our pipelines. It is a nasty cleanup effort, because we have not the errored.tfstate in the pipeline. The error itselfs seems to come from changed types in aws-go-sdk.
It should be possible to reproduce with trying to upload with connection issues.

@apparentlymart apparentlymart added the v1.6 Issues (primarily bugs) reported against v1.6 releases label Jan 19, 2024
@Alex-Waring
Copy link

Given the timeline, this seems a direct result of #34313. The changes look very innocent so possible an upstream issue.

@scottmchargue-eb
Copy link

We are seeing these errors sporadically with version 1.6.6 as well

@dms1981
Copy link

dms1981 commented Jan 30, 2024

We're also seeing this with Terraform 1.7.1.

@karfau
Copy link

karfau commented Feb 3, 2024

For us this started happening from time to time, and even happens when there are no changes to apply:

[...]
bm-mifro-stack No changes. Your infrastructure matches the configuration. 

Terraform has compared your real infrastructure against your configuration and 

found no differences, so no changes are needed. 

bm-mifro-stack
 │ Error: Failed to save state 
 │ Error saving state: failed to upload state: operation error S3: PutObject, 
 │ failed to rewind transport stream for retry, request stream is not seekable 
 │ Error: Failed to persist state to backend 
 │ The error shown above has prevented Terraform from writing the updated state 
 │ to the configured backend. To allow for recovery, the state has been written 
 │ to the file "errored.tfstate" in the current working directory. 
 │ Running "terraform apply" again at this point will create a forked state, 
 │ making it harder to recover. 
 | To retry writing this state, use the following command: 
 │ terraform state push errored.tfstate 
}

0 Stacks deploying 1 Stack done 0 Stacks waiting 

Invoking Terraform CLI failed with exit code 1 

error Command failed with exit code 1. 

Error: Process completed with exit code 1. 

Not sure why any state is being saved if no changes are being applied.

But for me a rerun of the failed job usually solves the issue.

@sleerssen
Copy link

just updated to terraform 1.6.6 and also seeing the issue.

@brenix
Copy link

brenix commented Feb 7, 2024

As a workaround, we set max_retries on the s3 backend to a value of 1, which appears to resolve the issue for us at the moment. This was based on conversation in this issue: aws/aws-sdk-go-v2#2038

@dmitry-sherlok
Copy link

The max_retries workaround unfortunately did not work for us (I was in part expecting this)

│ Error: Failed to save state
│ 
│ Error saving state: failed to upload state: operation error S3: PutObject,
│ exceeded maximum number of attempts, 1, https response error StatusCode: 0,
│ RequestID: , HostID: , request send failed, Put
│ "https://<my_bucket>.s3.ap-southeast-2.amazonaws.com/<my_tf_state_key>?x-id=PutObject":
│ write tcp 10.1.0.19:40800->52.95.130.58:443: use of closed network
│ connection

@delib-michael
Copy link

I'm seeing this issue in 1.7.3.

@mattthaber
Copy link

saw this randomy on 1.7.1

@petrosmelachrinos
Copy link

petrosmelachrinos commented Mar 4, 2024

Have the same issue on 1.7.2. Which is the latest version that this issue does not occur?

@sleerssen
Copy link

Have the same issue on 1.7.2. Which is the latest version that this issue does not occur?

1.6.4 is where we settled

@MihalisW
Copy link

MihalisW commented Mar 4, 2024

Seeing the same issue using the github action: hashicorp/setup-terraform@v2

@muawiakh
Copy link

muawiakh commented Mar 4, 2024

Seeing this with 1.6.6.

@chrisdenton-ct
Copy link

I saw this ticket was closed and just upgraded to the latest available version, 1.7.4, to pick up the fix.

... then I got the error again while deploying the upgrade.

@dlaudams
Copy link

@chrisdenton-ct 1.7.4 would not have this fix.

The fix was only merged 4 days ago. 2024-03-09.

1.7.4 was released prior to that on 2024-02-21.

@alekc
Copy link

alekc commented Mar 14, 2024

1.7.5 (March 13, 2024)
BUG FIXES:

backend/s3: When using s3 backend and encountering a network issue, the retry code would fail with "failed to rewind transport stream for retry". Now the retry should be successful. (#34796)

Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend/s3 bug new new issue not yet triaged v1.6 Issues (primarily bugs) reported against v1.6 releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.