Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store outputs separately from rest of tfstate #18603

Closed
tammersaleh opened this issue Aug 4, 2018 · 8 comments · Fixed by #26941
Closed

Store outputs separately from rest of tfstate #18603

tammersaleh opened this issue Aug 4, 2018 · 8 comments · Fixed by #26941

Comments

@tammersaleh
Copy link

tammersaleh commented Aug 4, 2018

Problem

Allowing a downstream project to read the state of and upstream one, Upstream must expose its entire terraform.tfstate file (via S3, TFE, etc). This file contains not only the outputs that Downstream relies on, but also sensitive information and secrets. It's not possible to expose outputs with remote state sharing without also exposing this unnecessary and sensitive information.

Solution

Outputs should be stored separately from the tfstate file. This can be opted-into, for downstream backward compatibility.

I'm imagining something a configuration where Upstream says:

terraform {
  backend "s3" {
    bucket = "bucket"
    key = "upstream.tfstate"
    outputs_key = "upstream-outputs.tfstate"
  }
}

Then in Downstream:

data "terraform_remote_state" "upstream" {
  backend = "s3"
  config {
    bucket = "bucket"
    key = "upstream-output.tfstate"
  }
}

The administrator for Upstream can then expose the upstream-outputs.tfstate file, while keeping the upstream.tfstate file private.

Note that output_key is entirely optional, and is of the same tfstate format. This is to ensure that Downstream doesn't need to be running the same version of Terraform (that supports output splitting) as Upstream. If this downstream-backward-compatibility isn't needed, then we could use this opportunity to store outputs in a cleaner JSON format, to allow for easier automation.

@apparentlymart
Copy link
Contributor

Hi @tammersaleh! Thanks for this feature request.

The terraform_remote_state data source is offered as a convenient way to easily connect Terraform configurations together using data that is already present. However, that's not the only way to share data between Terraform configurations.

For example, I know of several teams that run a Consul cluster and then use the consul_keys and/or consul_key_prefix resource type and data sources to publish a more "curated" set of data that can be consumed downstream. This requires a little more work but has the nice benefit of also making that information available to consumers that aren't Terraform.

Funnily enough, I proposed a similar thing in #3164 some time ago, before I joined HashiCorp to work on Terraform full time. After that discussion, old me successfully moved from terraform_remote_state to directly using Consul, and it helped us reduce the tight coupling between our Terraform configurations that allowed us to more easily refactor later on, because the same keys could be written to Consul by different Terraform configurations rather than forcing everything to be grouped by the configuration that generated the information.

With that said, we are interested in supporting more configuration stores like this because we think this pattern is a good one for larger Terraform usage where the problems of sharing state start to outweigh the convenience of that setup. At the time of writing, there are a few different permutations of this pattern possible with current provider support:

  • As noted above, Consul's key/value store can be used via the Consul provider.
  • The aws_s3_bucket_object resource type and associated data source allow using S3 for sharing data. S3 is not really optimized for sharing lots of small objects, so this has some friction today but should get better in Terraform 0.12 where there will be a jsondecode function that could then unpack multiple items stored in a single object.
  • If you have private DNS in your environment, such as with AWS Route53, you can create DNS records for relevant data and then use the DNS provider to read that data. Although this most ergonomic for IP addresses and hostnames (A, AAAA, and CNAME records) it can also be used for small free-form data in TXT records. (This one has the advantage that all of your systems probably already have working DNS anyway, in which case there's nothing new to set up.)

We may also add additional features for native remote state in future, but at this time we're recommending the above because in practice it seems to have some nice beneficial side-effects. I expect that we'll continue to add resource type and data source pairs to various providers to support other systems that you may already have in your environment for configuration storage.

@tammersaleh
Copy link
Author

tammersaleh commented Feb 14, 2019

That workaround works, but I think it might be good to shed some light on this issue through documentation. I expect a good number of Terraform users don't realize how much they're sharing when they use remote state.

(unless it's in the docs and I just missed it, in which case my apologies)

@apparentlymart
Copy link
Contributor

That's a great point, @tammersaleh. We'll use this issue to represent reviewing the documentation to make sure we're mentioning the above in a suitable place. (We do have it in the docs somewhere, but I'm not sure if it's well connected to the terraform_remote_state data source documentation.)

@bbrouwer
Copy link

I just had the same thought and found this ticket. What I was thinking was to have a separate backend just for output state. That might be easier to implement as there might not be as much need to change backend provider code.

terraform {
  backend "gcs" {
    bucket  = "some-bucket"
    prefix  = "some-path"
    project = "some-project"
  }
  output "gcs" {
    bucket  = "different-bucket"
    prefix  = "different-path"
    project = "different-project"
  }
}

@tesharp
Copy link

tesharp commented Sep 19, 2019

Had this problem as well and ended up with a solution to copy the output from one state to another. With 3 different backends we wanted to share some outputs from backend 1 to backend 2 and 3.

From backend 1 it runs a script post deploy (https://github.com/avinor/tau/blob/master/hack/az_copy_output_from_state.sh) that does a terraform state pull and then just removes all resources, so it is only left with the output variables. It then copies that filtered state file to backend 2 and 3.

Backend 2 and 3 can then read outputs without having access to backend 1. Outputs are always in sync as well since it is run post deploy.

Tried to describe it a bit here: https://github.com/avinor/tau/blob/master/docs/multiple_backends.md

Wouldn't mind a more clean solution though

@duckpuppy
Copy link
Contributor

I think the problem with suggesting Consul as a way to share data between Terraform projects is that now you have to manage a Consul cluster, along with ACL controls for the various outputs, to provide functionality that is present in Terraform out of the box and is significantly more effort on the part of a team. Managing DNS TXT records for data sharing just seems like an incredible hack, and counter to what DNS is for.

Terraform state is currently serving two purposes - tracking resource state to facilitate running plans and detecting changes as well as publishing data for remote consumption. These two things can conflict if the sharing is not internal to a given team - I want to allow an external team to retrieve outputs from my Terraform state, but I don't want that team to have access to the potential secrets that ended up in the state file as part of, say, a provider config. I also want to make sure that the sensitive state is encrypted, but the outputs are things I've specifically chosen to expose and should not be encrypted. Ideally the two functions should be separated.

@apparentlymart
Copy link
Contributor

Publishing data explicitly to some other location remains the recommended way to do this if you want to keep the data separate from the Terraform state. As I noted above, the use of remote state is just a convenience for making use of the data already published to avoid configuring an extra location, and splitting the publishing to some other location would defeat that advantage and make it not significantly different than publishing in any other place.

I gave Consul, S3, and DNS as examples, but my general point is that there are numerous places you can publish data and read it back later. Several new options have emerged since my comment in 2018, including various configuration stores that come built in to the various cloud platforms so that you don't need to run anything new.

In the intervening years since my earlier comment we've seen a lot more people successfully employ the pattern of explicitly publishing configuration data to a specific location so that both Terraform and other systems can retrieve it, and we documented a pattern for doing so as Data-only Modules in the Module Composition guide, as a way to encapsulate the reading of the data so that consumers don't need to directly configure the details.

Looking back at the conversation history here, it seems like we were using this issue to represent the task of updating the terraform_remote_state data source documentation to be more explicit about the implications of using it and to recommend the other approaches we've discussed here, so in order to finally close this out I'm going to work on a small documentation update to reflect that.

@github-actions
Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants