S3 bucket rewind error when running apply #34528

DazKins · 2024-01-16T09:50:52Z

Terraform Version

Terraform v1.6.5
on darwin_arm64

Terraform Configuration Files

// providers.tf

terraform {
  required_providers {
    ...
  }
  required_version = "~> 1.6.5"

  backend "s3" {
    region  = "eu-west-1"
    bucket  = "..."
    key     = "terraform.tfstate"
    encrypt = true
  }
}

Debug Output

I can't reproduce the issue again so can't get a debug output, but here is the default output logs we saw:

2024-01-16T07:15:31.2911852Z ##[group]Run terraform apply -no-color -auto-approve -input=false
2024-01-16T07:15:31.2912737Z �[36;1mterraform apply -no-color -auto-approve -input=false�[0m
2024-01-16T07:15:31.2956434Z shell: /usr/bin/bash -e {0}
2024-01-16T07:15:31.2957094Z env:
2024-01-16T07:15:31.2958080Z   TF_AWS_ROLE: arn:aws:iam::***:role/***
2024-01-16T07:15:31.2958921Z   AWS_DEFAULT_REGION: eu-west-1
2024-01-16T07:15:31.2959498Z   AWS_REGION: eu-west-1
2024-01-16T07:15:31.2960162Z   AWS_ACCESS_KEY_ID: ***
2024-01-16T07:15:31.2960951Z   AWS_SECRET_ACCESS_KEY: ***
2024-01-16T07:15:31.2970497Z   AWS_SESSION_TOKEN: ***
2024-01-16T07:15:31.2971311Z   TERRAFORM_CLI_PATH: /home/runner/work/_temp/8093aa56-402a-471c-af37-829ad1f1dc96
2024-01-16T07:15:31.2972191Z ##[endgroup]
2024-01-16T07:15:39.7033998Z module.*****: Refreshing state... [id=14821739]
2024-01-16T07:15:39.7035304Z 
2024-01-16T07:15:39.7035795Z Error: Failed to save state
2024-01-16T07:15:39.7036852Z 
2024-01-16T07:15:39.7037415Z Error saving state: failed to upload state: operation error S3: PutObject,
2024-01-16T07:15:39.7038426Z failed to rewind transport stream for retry, request stream is not seekable
2024-01-16T07:15:39.7039030Z 
2024-01-16T07:15:39.7039240Z Error: Failed to persist state to backend
2024-01-16T07:15:39.7039628Z 
2024-01-16T07:15:39.7040057Z The error shown above has prevented Terraform from writing the updated state
2024-01-16T07:15:39.7041088Z to the configured backend. To allow for recovery, the state has been written
2024-01-16T07:15:39.7042033Z to the file "errored.tfstate" in the current working directory.
2024-01-16T07:15:39.7042577Z 
2024-01-16T07:15:39.7043108Z Running "terraform apply" again at this point will create a forked state,
2024-01-16T07:15:39.7043870Z making it harder to recover.
2024-01-16T07:15:39.7044266Z 
2024-01-16T07:15:39.7045109Z To retry writing this state, use the following command:
2024-01-16T07:15:39.7045794Z     terraform state push errored.tfstate

Expected Behavior

Terraform apply should run succesfully. If an error occurs during state sync I would also expect the run to end immediately to prevent any divergence in state.

Actual Behavior

Terraform apply continued running but appeared to have some bad state causing it to create/delete resources incorrectly.

Steps to Reproduce

terraform init -no-color
terraform apply -no-color -auto-approve -input=false

As I mentioned earlier, we can't reliably re-produce though.

Additional Context

This was running inside a GHA using hashicorp/setup-terraform@v3

We were also using AWS OIDC to assume a role automatically in the runner: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services

References

No response

The text was updated successfully, but these errors were encountered:

simonblake-mp · 2024-01-17T05:06:07Z

FWIW, I'm seeing this same issue in v1.6.6 both from laptop and in pipelines - after all the resource changes have happened, terraform fails with

  | │ Error: Failed to save state
  | │
  | │ Error saving state: failed to upload state: operation error S3: PutObject, failed to rewind transport stream for retry, request stream is not seekable

seems to be completely random, always in a tree with multiple workspaces. Nothing fancy in our perms, standard AWS SSO users on laptop and EC2 instance IAM roles in CI.

Having it fail in pipelines is particularly problematic, as our pipelines don't currently save errored.tfstate if it exists - so cleaning up is nasty manual state manipulation.

ewastempel · 2024-01-17T11:35:28Z

We see the same issue in v1.6.6. Please see the trace for more insights:

2024-01-16T14:27:51.0906051Z 2024-01-16T14:27:50.918Z [DEBUG] states/remote: state read lineage is: e29388d3-e6cf-9115-e169-5f1d9f976c58; lineage is: e29388d3-e6cf-9115-e169-5f1d9f976c58
2024-01-16T14:27:51.0907551Z 2024-01-16T14:27:50.934Z [INFO]  backend-s3: Uploading remote state: tf_backend.operation=Put tf_backend.req_id=23d0d7ed-07da-d344-e6e7-c8ba9a9e84cd tf_backend.s3.bucket=*** tf_backend.s3.path=***
2024-01-16T14:27:51.0918815Z 2024-01-16T14:27:50.941Z [DEBUG] backend-s3: HTTP Request Sent: aws.region=eu-west-2 aws.s3.bucket=*** aws.s3.key=*** rpc.method=PutObject rpc.service=S3 rpc.system=aws-api tf_aws.sdk=aws-sdk-go-v2 tf_aws.signing_region="" tf_backend.operation=Put tf_backend.req_id=23d0d7ed-07da-d344-e6e7-c8ba9a9e84cd tf_backend.s3.bucket=*** tf_backend.s3.path=*** http.request.header.x_amz_decoded_content_length=129597 http.request.header.authorization="AWS4-HMAC-SHA256 Credential=ASIA************KEUA/20240116/eu-west-2/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;content-encoding;content-length;content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-sdk-checksum-algorithm;x-amz-security-token;x-amz-server-side-encryption;x-amz-trailer, Signature=*****" http.request.header.x_amz_date=20240116T142750Z http.request.header.x_amz_content_sha256=STREAMING-UNSIGNED-PAYLOAD-TRAILER http.request.header.x_amz_trailer=x-amz-checksum-sha256 net.peer.name=***.s3.eu-west-2.amazonaws.com http.request_content_length=129679 http.request.body="[Redacted: 126.6 KB (129,679 bytes), Type: application/json]" http.url=https://***.s3.eu-west-2.amazonaws.com/***?x-id=PutObject http.request.header.x_amz_security_token="*****" http.request.header.x_amz_acl=bucket-owner-full-control http.request.header.amz_sdk_request="attempt=1; max=5" http.request.header.accept_encoding=identity http.request.header.x_amz_server_side_encryption=AES256 http.request.header.content_encoding=aws-chunked http.request.header.x_amz_sdk_checksum_algorithm=SHA256 http.method=PUT http.user_agent="APN/1.0 HashiCorp/1.0 Terraform/1.6.6 (+https://www.terraform.io) aws-sdk-go-v2/1.24.0 os/linux lang/go#1.21.5 md/GOOS#linux md/GOARCH#amd64 api/s3#1.47.5 ft/s3-transfer" http.request.header.amz_sdk_invocation_id=1fae9720-5924-45a9-832c-2bce77cca664 http.request.header.content_type=application/json
2024-01-16T14:27:51.5915142Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: reading initial snapshot from errored.tfstate
2024-01-16T14:27:51.5916619Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay
2024-01-16T14:27:51.5917812Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: read nil snapshot
2024-01-16T14:27:51.5919595Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: Importing snapshot with lineage "e29388d3-e6cf-9115-e169-5f1d9f976c58" serial 251 as the initial state snapshot at errored.tfstate
2024-01-16T14:27:51.5921167Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: preparing to manage state snapshots at errored.tfstate
2024-01-16T14:27:51.5922242Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: no previously-stored snapshot exists
2024-01-16T14:27:51.5923195Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: state file backups are disabled
2024-01-16T14:27:51.5924429Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: forcing lineage "e29388d3-e6cf-9115-e169-5f1d9f976c58" serial 251 for migration/import
2024-01-16T14:27:51.5925663Z 2024-01-16T14:27:51.591Z [TRACE] statemgr.Filesystem: writing snapshot at errored.tfstate
2024-01-16T14:27:51.5940360Z 
2024-01-16T14:27:51.5940616Z Error: Failed to save state
2024-01-16T14:27:51.5940990Z 
2024-01-16T14:27:51.5941454Z Error saving state: failed to upload state: operation error S3: PutObject,
2024-01-16T14:27:51.5942382Z failed to rewind transport stream for retry, request stream is not seekable
2024-01-16T14:27:51.5942793Z 
2024-01-16T14:27:51.5942939Z Error: Failed to persist state to backend
2024-01-16T14:27:51.5943194Z 
2024-01-16T14:27:51.5943479Z The error shown above has prevented Terraform from writing the updated state
2024-01-16T14:27:51.5944159Z to the configured backend. To allow for recovery, the state has been written
2024-01-16T14:27:51.5944977Z to the file "errored.tfstate" in the current working directory.
2024-01-16T14:27:51.5945328Z 
2024-01-16T14:27:51.5945594Z Running "terraform apply" again at this point will create a forked state,
2024-01-16T14:27:51.5946094Z making it harder to recover.
2024-01-16T14:27:51.5946293Z 
2024-01-16T14:27:51.5946481Z To retry writing this state, use the following command:
2024-01-16T14:27:51.5946923Z     terraform state push errored.tfstate
2024-01-16T14:27:51.5947279Z 
2024-01-16T14:27:51.6009041Z data.aws_iam_policy_document.assume_role_policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6010040Z data.aws_iam_roles.github_actions_role: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6010940Z module.member-access[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6011803Z module.member-access-us-east[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6012988Z module.member-access-eu-central[0].data.aws_iam_policy_document.assume-role-policy: Reading...
2024-01-16T14:27:51.6013994Z module.member-access[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6015084Z module.member-access-us-east[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6016230Z module.member-access-eu-central[0].data.aws_iam_policy_document.assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6017237Z module.member-access[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6018331Z module.member-access-us-east[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6019555Z module.member-access-us-east[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6021455Z module.member-access-eu-central[0].data.aws_iam_policy_document.combined-assume-role-policy: Reading...
2024-01-16T14:27:51.6022749Z module.member-access[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6023985Z module.member-access-eu-central[0].data.aws_iam_policy_document.combined-assume-role-policy: Read complete after 0s [id=<REDACTED>]
2024-01-16T14:27:51.6025019Z module.member-access[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6025868Z module.member-access-eu-central[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6026729Z module.member-access-us-east[0].aws_iam_role.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6027463Z data.aws_iam_session_context.whoami: Read complete after 1s [id=<REDACTED>]
2024-01-16T14:27:51.6028171Z data.aws_organizations_organization.root_account: Read complete after 1s [id=<REDACTED>]
2024-01-16T14:27:51.6029100Z module.ssm-cross-account-access.aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6030158Z module.instance-scheduler-access[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6031179Z module.member-access-us-east[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6032185Z module.member-access-eu-central[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]
2024-01-16T14:27:51.6033154Z module.member-access[0].aws_iam_role_policy_attachment.default: Refreshing state... [id=<REDACTED>]

NOTE, s3 bucket and tf backend values were further redacted with ***.

DorisEckel8412 · 2024-01-18T07:29:04Z

Hi,
saw also this error since 1.6.5 in our pipelines. It is a nasty cleanup effort, because we have not the errored.tfstate in the pipeline. The error itselfs seems to come from changed types in aws-go-sdk.
It should be possible to reproduce with trying to upload with connection issues.

Alex-Waring · 2024-01-22T09:26:16Z

Given the timeline, this seems a direct result of #34313. The changes look very innocent so possible an upstream issue.

scottmchargue-eb · 2024-01-26T17:53:53Z

We are seeing these errors sporadically with version 1.6.6 as well

dms1981 · 2024-01-30T09:57:59Z

We're also seeing this with Terraform 1.7.1.

karfau · 2024-02-03T11:15:57Z

For us this started happening from time to time, and even happens when there are no changes to apply:

[...]
bm-mifro-stack No changes. Your infrastructure matches the configuration. 

Terraform has compared your real infrastructure against your configuration and 

found no differences, so no changes are needed. 

bm-mifro-stack
 │ Error: Failed to save state 
 │ Error saving state: failed to upload state: operation error S3: PutObject, 
 │ failed to rewind transport stream for retry, request stream is not seekable 
 │ Error: Failed to persist state to backend 
 │ The error shown above has prevented Terraform from writing the updated state 
 │ to the configured backend. To allow for recovery, the state has been written 
 │ to the file "errored.tfstate" in the current working directory. 
 │ Running "terraform apply" again at this point will create a forked state, 
 │ making it harder to recover. 
 | To retry writing this state, use the following command: 
 │ terraform state push errored.tfstate 
}

0 Stacks deploying 1 Stack done 0 Stacks waiting 

Invoking Terraform CLI failed with exit code 1 

error Command failed with exit code 1. 

Error: Process completed with exit code 1.

Not sure why any state is being saved if no changes are being applied.

But for me a rerun of the failed job usually solves the issue.

sleerssen · 2024-02-07T15:28:23Z

just updated to terraform 1.6.6 and also seeing the issue.

brenix · 2024-02-07T19:25:30Z

As a workaround, we set max_retries on the s3 backend to a value of 1, which appears to resolve the issue for us at the moment. This was based on conversation in this issue: aws/aws-sdk-go-v2#2038

dmitry-sherlok · 2024-02-09T00:08:25Z

The max_retries workaround unfortunately did not work for us (I was in part expecting this)

│ Error: Failed to save state
│ 
│ Error saving state: failed to upload state: operation error S3: PutObject,
│ exceeded maximum number of attempts, 1, https response error StatusCode: 0,
│ RequestID: , HostID: , request send failed, Put
│ "https://<my_bucket>.s3.ap-southeast-2.amazonaws.com/<my_tf_state_key>?x-id=PutObject":
│ write tcp 10.1.0.19:40800->52.95.130.58:443: use of closed network
│ connection

delib-michael · 2024-02-21T09:19:37Z

I'm seeing this issue in 1.7.3.

mattthaber · 2024-02-21T20:37:38Z

saw this randomy on 1.7.1

petrosmelachrinos · 2024-03-04T15:21:45Z

Have the same issue on 1.7.2. Which is the latest version that this issue does not occur?

sleerssen · 2024-03-04T15:23:26Z

Have the same issue on 1.7.2. Which is the latest version that this issue does not occur?

1.6.4 is where we settled

MihalisW · 2024-03-04T15:41:23Z

Seeing the same issue using the github action: hashicorp/setup-terraform@v2

muawiakh · 2024-03-04T15:57:46Z

Seeing this with 1.6.6.

chrisdenton-ct · 2024-03-12T16:30:35Z

I saw this ticket was closed and just upgraded to the latest available version, 1.7.4, to pick up the fix.

... then I got the error again while deploying the upgrade.

dlaudams · 2024-03-13T01:15:35Z

@chrisdenton-ct 1.7.4 would not have this fix.

The fix was only merged 4 days ago. 2024-03-09.

1.7.4 was released prior to that on 2024-02-21.

alekc · 2024-03-14T10:44:07Z

1.7.5 (March 13, 2024)
BUG FIXES:

backend/s3: When using s3 backend and encountering a network issue, the retry code would fail with "failed to rewind transport stream for retry". Now the retry should be successful. (#34796)

github-actions · 2024-04-14T02:25:57Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

DazKins added bug new new issue not yet triaged labels Jan 16, 2024

crw added the backend/s3 label Jan 16, 2024

ewastempel mentioned this issue Jan 17, 2024

Bug - Error: Failed to persist state to backend on modernisation-platform ministryofjustice/modernisation-platform#5859

Closed

apparentlymart added the v1.6 Issues (primarily bugs) reported against v1.6 releases label Jan 19, 2024

ewastempel mentioned this issue Jan 23, 2024

Fix terraform state persistence issue in all repos ministryofjustice/modernisation-platform#6038

Closed

20 tasks

project0 mentioned this issue Feb 9, 2024

S3 upload manager retry broken with seekable body aws/aws-sdk-go-v2#2485

Closed

kojoru mentioned this issue Mar 8, 2024

Update AWS SDK versions to the latest #34796

Merged

gdavison closed this as completed in #34796 Mar 8, 2024

This was referenced Mar 8, 2024

Backport of Update AWS SDK versions to the latest into v1.7 #34804

Merged

Backport of Update AWS SDK versions to the latest into v1.8 #34805

Merged

richgreen-moj mentioned this issue Mar 18, 2024

Roll back Slack failure notification logic for persisting errored TF state ministryofjustice/modernisation-platform#6498

Closed

4 tasks

github-actions bot locked as resolved and limited conversation to collaborators Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 bucket rewind error when running apply #34528

S3 bucket rewind error when running apply #34528

DazKins commented Jan 16, 2024 •

edited

Loading

simonblake-mp commented Jan 17, 2024 •

edited

Loading

ewastempel commented Jan 17, 2024

DorisEckel8412 commented Jan 18, 2024 •

edited

Loading

Alex-Waring commented Jan 22, 2024

scottmchargue-eb commented Jan 26, 2024

dms1981 commented Jan 30, 2024

karfau commented Feb 3, 2024 •

edited

Loading

sleerssen commented Feb 7, 2024

brenix commented Feb 7, 2024 •

edited

Loading

dmitry-sherlok commented Feb 9, 2024

delib-michael commented Feb 21, 2024

mattthaber commented Feb 21, 2024

petrosmelachrinos commented Mar 4, 2024 •

edited

Loading

sleerssen commented Mar 4, 2024

MihalisW commented Mar 4, 2024

muawiakh commented Mar 4, 2024

chrisdenton-ct commented Mar 12, 2024

dlaudams commented Mar 13, 2024

alekc commented Mar 14, 2024

github-actions bot commented Apr 14, 2024

S3 bucket rewind error when running apply #34528

S3 bucket rewind error when running apply #34528

Comments

DazKins commented Jan 16, 2024 • edited Loading

Terraform Version

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Additional Context

References

simonblake-mp commented Jan 17, 2024 • edited Loading

ewastempel commented Jan 17, 2024

DorisEckel8412 commented Jan 18, 2024 • edited Loading

Alex-Waring commented Jan 22, 2024

scottmchargue-eb commented Jan 26, 2024

dms1981 commented Jan 30, 2024

karfau commented Feb 3, 2024 • edited Loading

sleerssen commented Feb 7, 2024

brenix commented Feb 7, 2024 • edited Loading

dmitry-sherlok commented Feb 9, 2024

delib-michael commented Feb 21, 2024

mattthaber commented Feb 21, 2024

petrosmelachrinos commented Mar 4, 2024 • edited Loading

sleerssen commented Mar 4, 2024

MihalisW commented Mar 4, 2024

muawiakh commented Mar 4, 2024

chrisdenton-ct commented Mar 12, 2024

dlaudams commented Mar 13, 2024

alekc commented Mar 14, 2024

github-actions bot commented Apr 14, 2024

DazKins commented Jan 16, 2024 •

edited

Loading

simonblake-mp commented Jan 17, 2024 •

edited

Loading

DorisEckel8412 commented Jan 18, 2024 •

edited

Loading

karfau commented Feb 3, 2024 •

edited

Loading

brenix commented Feb 7, 2024 •

edited

Loading

petrosmelachrinos commented Mar 4, 2024 •

edited

Loading