Store outputs separately from rest of tfstate #18603

tammersaleh · 2018-08-04T20:13:57Z

Problem

Allowing a downstream project to read the state of and upstream one, Upstream must expose its entire terraform.tfstate file (via S3, TFE, etc). This file contains not only the outputs that Downstream relies on, but also sensitive information and secrets. It's not possible to expose outputs with remote state sharing without also exposing this unnecessary and sensitive information.

Solution

Outputs should be stored separately from the tfstate file. This can be opted-into, for downstream backward compatibility.

I'm imagining something a configuration where Upstream says:

terraform {
  backend "s3" {
    bucket = "bucket"
    key = "upstream.tfstate"
    outputs_key = "upstream-outputs.tfstate"
  }
}

Then in Downstream:

data "terraform_remote_state" "upstream" {
  backend = "s3"
  config {
    bucket = "bucket"
    key = "upstream-output.tfstate"
  }
}

The administrator for Upstream can then expose the upstream-outputs.tfstate file, while keeping the upstream.tfstate file private.

Note that output_key is entirely optional, and is of the same tfstate format. This is to ensure that Downstream doesn't need to be running the same version of Terraform (that supports output splitting) as Upstream. If this downstream-backward-compatibility isn't needed, then we could use this opportunity to store outputs in a cleaner JSON format, to allow for easier automation.

The text was updated successfully, but these errors were encountered:

apparentlymart · 2018-08-13T16:48:34Z

Hi @tammersaleh! Thanks for this feature request.

The terraform_remote_state data source is offered as a convenient way to easily connect Terraform configurations together using data that is already present. However, that's not the only way to share data between Terraform configurations.

For example, I know of several teams that run a Consul cluster and then use the consul_keys and/or consul_key_prefix resource type and data sources to publish a more "curated" set of data that can be consumed downstream. This requires a little more work but has the nice benefit of also making that information available to consumers that aren't Terraform.

Funnily enough, I proposed a similar thing in #3164 some time ago, before I joined HashiCorp to work on Terraform full time. After that discussion, old me successfully moved from terraform_remote_state to directly using Consul, and it helped us reduce the tight coupling between our Terraform configurations that allowed us to more easily refactor later on, because the same keys could be written to Consul by different Terraform configurations rather than forcing everything to be grouped by the configuration that generated the information.

With that said, we are interested in supporting more configuration stores like this because we think this pattern is a good one for larger Terraform usage where the problems of sharing state start to outweigh the convenience of that setup. At the time of writing, there are a few different permutations of this pattern possible with current provider support:

As noted above, Consul's key/value store can be used via the Consul provider.
The aws_s3_bucket_object resource type and associated data source allow using S3 for sharing data. S3 is not really optimized for sharing lots of small objects, so this has some friction today but should get better in Terraform 0.12 where there will be a jsondecode function that could then unpack multiple items stored in a single object.
If you have private DNS in your environment, such as with AWS Route53, you can create DNS records for relevant data and then use the DNS provider to read that data. Although this most ergonomic for IP addresses and hostnames (A, AAAA, and CNAME records) it can also be used for small free-form data in TXT records. (This one has the advantage that all of your systems probably already have working DNS anyway, in which case there's nothing new to set up.)

We may also add additional features for native remote state in future, but at this time we're recommending the above because in practice it seems to have some nice beneficial side-effects. I expect that we'll continue to add resource type and data source pairs to various providers to support other systems that you may already have in your environment for configuration storage.

tammersaleh · 2019-02-14T16:35:34Z

That workaround works, but I think it might be good to shed some light on this issue through documentation. I expect a good number of Terraform users don't realize how much they're sharing when they use remote state.

(unless it's in the docs and I just missed it, in which case my apologies)

apparentlymart · 2019-02-14T17:44:59Z

That's a great point, @tammersaleh. We'll use this issue to represent reviewing the documentation to make sure we're mentioning the above in a suitable place. (We do have it in the docs somewhere, but I'm not sure if it's well connected to the terraform_remote_state data source documentation.)

bbrouwer · 2019-05-23T14:01:34Z

I just had the same thought and found this ticket. What I was thinking was to have a separate backend just for output state. That might be easier to implement as there might not be as much need to change backend provider code.

terraform {
  backend "gcs" {
    bucket  = "some-bucket"
    prefix  = "some-path"
    project = "some-project"
  }
  output "gcs" {
    bucket  = "different-bucket"
    prefix  = "different-path"
    project = "different-project"
  }
}

tesharp · 2019-09-19T23:09:27Z

Had this problem as well and ended up with a solution to copy the output from one state to another. With 3 different backends we wanted to share some outputs from backend 1 to backend 2 and 3.

From backend 1 it runs a script post deploy (https://github.com/avinor/tau/blob/master/hack/az_copy_output_from_state.sh) that does a terraform state pull and then just removes all resources, so it is only left with the output variables. It then copies that filtered state file to backend 2 and 3.

Backend 2 and 3 can then read outputs without having access to backend 1. Outputs are always in sync as well since it is run post deploy.

Tried to describe it a bit here: https://github.com/avinor/tau/blob/master/docs/multiple_backends.md

Wouldn't mind a more clean solution though

duckpuppy · 2020-11-16T20:06:18Z

I think the problem with suggesting Consul as a way to share data between Terraform projects is that now you have to manage a Consul cluster, along with ACL controls for the various outputs, to provide functionality that is present in Terraform out of the box and is significantly more effort on the part of a team. Managing DNS TXT records for data sharing just seems like an incredible hack, and counter to what DNS is for.

Terraform state is currently serving two purposes - tracking resource state to facilitate running plans and detecting changes as well as publishing data for remote consumption. These two things can conflict if the sharing is not internal to a given team - I want to allow an external team to retrieve outputs from my Terraform state, but I don't want that team to have access to the potential secrets that ended up in the state file as part of, say, a provider config. I also want to make sure that the sensitive state is encrypted, but the outputs are things I've specifically chosen to expose and should not be encrypted. Ideally the two functions should be separated.

apparentlymart · 2020-11-16T22:38:27Z

Publishing data explicitly to some other location remains the recommended way to do this if you want to keep the data separate from the Terraform state. As I noted above, the use of remote state is just a convenience for making use of the data already published to avoid configuring an extra location, and splitting the publishing to some other location would defeat that advantage and make it not significantly different than publishing in any other place.

I gave Consul, S3, and DNS as examples, but my general point is that there are numerous places you can publish data and read it back later. Several new options have emerged since my comment in 2018, including various configuration stores that come built in to the various cloud platforms so that you don't need to run anything new.

In the intervening years since my earlier comment we've seen a lot more people successfully employ the pattern of explicitly publishing configuration data to a specific location so that both Terraform and other systems can retrieve it, and we documented a pattern for doing so as Data-only Modules in the Module Composition guide, as a way to encapsulate the reading of the data so that consumers don't need to directly configure the details.

Looking back at the conversation history here, it seems like we were using this issue to represent the task of updating the terraform_remote_state data source documentation to be more explicit about the implications of using it and to recommend the other approaches we've discussed here, so in order to finally close this out I'm going to work on a small documentation update to reflect that.

github-actions · 2021-05-15T02:14:40Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

mildwonkey added the enhancement label Aug 6, 2018

apparentlymart added documentation provider/terraform hashibot/ignore labels Feb 14, 2019

jantman mentioned this issue Nov 16, 2020

we can read all TF_state data via terraform_remote_state , not only outputs #25284

Closed

apparentlymart self-assigned this Nov 16, 2020

apparentlymart mentioned this issue Nov 17, 2020

website: Document alternatives to terraform_remote_state #26941

Merged

apparentlymart closed this as completed in #26941 Nov 17, 2020

teamterraform mentioned this issue Nov 17, 2020

Backport of website: Document alternatives to terraform_remote_state into v0.14 #26946

Merged

github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store outputs separately from rest of tfstate #18603

Store outputs separately from rest of tfstate #18603

tammersaleh commented Aug 4, 2018 •

edited

Loading

apparentlymart commented Aug 13, 2018

tammersaleh commented Feb 14, 2019 •

edited

Loading

apparentlymart commented Feb 14, 2019

bbrouwer commented May 23, 2019

tesharp commented Sep 19, 2019

duckpuppy commented Nov 16, 2020

apparentlymart commented Nov 16, 2020

github-actions bot commented May 15, 2021

Store outputs separately from rest of tfstate #18603

Store outputs separately from rest of tfstate #18603

Comments

tammersaleh commented Aug 4, 2018 • edited Loading

Problem

Solution

apparentlymart commented Aug 13, 2018

tammersaleh commented Feb 14, 2019 • edited Loading

apparentlymart commented Feb 14, 2019

bbrouwer commented May 23, 2019

tesharp commented Sep 19, 2019

duckpuppy commented Nov 16, 2020

apparentlymart commented Nov 16, 2020

github-actions bot commented May 15, 2021

tammersaleh commented Aug 4, 2018 •

edited

Loading

tammersaleh commented Feb 14, 2019 •

edited

Loading