Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure proper order for obtaining credentials, assuming roles, using profiles #5

Merged
merged 2 commits into from
Oct 3, 2019

Conversation

YakDriver
Copy link
Member

@YakDriver YakDriver commented May 3, 2019

Fixes #4

Cleans up credential obtaining logic.

NOTE: I contributed the credential process provider to the underlying AWS SDK Go.

Proposal:

  • Ensure creds obtained (e.g., session-derived) before attempting to assume role
  • Update GetCredentials() contract so that creds are retrieved and validated - simplifying logic in GetSession() and GetSessionOptions().

Currently, the bug prevents fall-back creds (e.g., session-derived) from being used when assuming a role since session-derived creds are not obtained until after the assume role logic. As a result, these circumstances will always result in failure to use credentials even when they are valid.

This PR fixes the bug.

In this PR, GetCredentials() has a new contract - it provides and validates credentials. Before, GetCredentials() would sometimes return unvalidated creds and sometimes validated creds. This meant that more error handling logic needed to be included in GetSession() and GetSessionOptions(). As part of validating creds, GetCredentials() now gets session-derived creds, if necessary, prior to assuming a role.

Related/possibly also fixed:

@YakDriver
Copy link
Member Author

YakDriver commented May 3, 2019

In contrast to execution graphs in issue #4, this is an updated graph.

With this PR Session-derived creds with or without profile are obtained before assume role logic:

GetSession()
    |
    |---> GetSessionOptions()
    |       |
    |       |---> GetCredentials()
    |       |       |
    |       |       |---> GetCredentialsFromSession() (call at awsauth.go:294)
    |       |       |
    |       |       |---> Assume role (starting awsauth.go:306)

@lorengordon
Copy link

This would be very handy, allowing us to use credential_process in the profile to retrieve a credential from a federated identity provider, and the assume_role block in the terraform provider config to switch role. Interpolation in the assume_role block reduces the number of profiles to manage on the system.

@max-rocket-internet
Copy link

@bflad can we get this reviewed and merged? It's a major problem for us also (hashicorp/terraform-provider-aws#8052)

@bsAdvanon
Copy link

I'm blocked by the same issues at the moment. Any ideas about when this could get merged?

@bsuv
Copy link

bsuv commented May 20, 2019

@radeksimko @grubernaut @bflad ^^^

@kinghuang
Copy link

kinghuang commented Jun 12, 2019

Is there any word on when this will be reviewed and merged? There hasn't been any commit activity at all on this repo for months. I'm also impacted by hashicorp/terraform-provider-aws#8052.

@YakDriver
Copy link
Member Author

YakDriver commented Jun 12, 2019

¯\(ツ)
I wish I knew.

@bflad
Copy link
Contributor

bflad commented Jun 13, 2019

Hi folks and thanks so much @YakDriver for taking this on. 👋 Sorry for the delays in getting this pull request reviewed. This one is on me for not leaving a comment here sooner. 🤕

One of the major challenges that we have here is that this code has been inherited without much documentation and testing from when it progressed from Terraform Core (when providers lived in there) to the Terraform AWS Provider to this library (see also #2). This means that while the changes here might potentially work as expected, we have no basis for potential regressions either other than waiting for bug reports. 😖 Since my tenure in helping maintain the Terraform AWS Provider, changes to this logic were always problematic for someone's configuration. It would certainly be preferred to get this codebase out of its current rut though and for sure help to squash this bug as it is certainly creating a frustrating experience as noted here and in the provider issues.

Real briefly, the various ways to getting this out there given the current state of this library would be:

  • After an in depth review of the physical code changes for an estimate of potential regressions in current behavior, accepting this, and forcing a major Terraform AWS Provider release so it can be used easily out in the wild under the guise that it may have introduced breaking behavior. The maintainers of the Terraform AWS Providers have not fully decided what version 3.0.0 will or should contain, so in full disclosure, those discussions would likely bottleneck this process.
  • Using the issue this is closing (Error getting creds when assuming role and using fallback credentials #4) to capture user experience reports with custom build Terraform AWS Providers that use this update. To that end, providing a how-to guide to accomplish this might help everyone. I mention using the issue for this purpose because to a certain degree we do not want the pull request comments to get into non-code discussions if there are a lot of incoming reports.
  • While I haven't personally looked too much into this due to priorities being elsewhere, I have a feeling we might be able to piggyback the testing implementation in Implement End-To-End Testing for Session Handling #2 based on how the AWS Go SDK performs its testing. Other than determining the implementation of these tests, the code itself might not be well suited for the decided testing framework, which could present its own problems. This discussion should probably occur in Implement End-To-End Testing for Session Handling #2 rather than in here just to keep this particular code change's comments relevant.

We don't necessarily need to pick and choose one of these options. In fact, agreeing/working on all fronts would likely ensure the best outcome for everyone.

Apologies again for the frustrating experience. If we could get some help/opinions on these items, hopefully we can push this forward in a constructive manner.

@lorengordon
Copy link

lorengordon commented Jun 13, 2019

On the plus side, there are no (known) backwards breaking changes here, no syntax changes required in the user config. Doesn't really seem to warrant waiting on a major release. A bug is a bug, it gets reported then patched and released. An impacted user can easily pin their version in the meantime.

Totally agree tests around the credential handling would be helpful. Do you have any idea how you might want to see those tests setup?

@YakDriver
Copy link
Member Author

I know it's anecdotal and practically useless but I did integration tests with every possible scenario I could think of, backwards and forwards, and could not find anything breaking with these changes.

As for @bflad's points:

  1. I agree with @lorengordon that there are no known breaking changes with this PR. However, any users affected by potential breaking changes should be weighed against the many users known to be experiencing issues.
  2. I can provide guidance on building binaries using the patch. This would probably be doable in the shortish-term.
  3. My creating regression tests is probably not going to happen in the short-term. We'd all welcome help if someone is available. I agree that these would be very comforting to have.

@ianwsperber
Copy link

I'd like to chime in my support for the second point as an interim solution until a fix is released. If there were a simple-ish mechanism to run the provider with a custom binary I'd be happy to provide feedback after the fact on whether we've encountered any issues. I'm sure it'll be valuable to have in place long term, and it seems like a good baseline for getting feedback on the fix in lieu of regression tests.

@YakDriver
Copy link
Member Author

If you would like to build Terraform and the AWS provider with this patch, I've created instructions on how to do it. It's a bit beyond the scope of a GitHub comment so please check out the article on Medium. Thanks @ianwsperber for being willing to try it out. Let me know if you have trouble with the instructions.

Post user experience reports on #4!

Leave comments specific to this PR's code changes here.

@ianwsperber
Copy link

@YakDriver My apologies, I have been and continue to be too busy to validate this myself. I just posted in a related issue to see if anyone else might be willing to validate your solution: hashicorp/terraform-provider-aws#8052

@timoguin
Copy link

timoguin commented Aug 5, 2019

I gave it a good try, but I've not been able to get everything to build properly. :(

@YakDriver
Copy link
Member Author

@timoguin If you have specific problems, let me know in the article comments.

@mohrt
Copy link

mohrt commented Aug 20, 2019

I went around and around with a related problem today, hopefully this might help someone. I have my terraform state saved in an s3 bucket, and I was getting permission denied errors. Here are my config/credentials files:

[default]
aws_access_key_id = REMOVED
aws_secret_access_key = REMOVED

[prod]
aws_access_key_id = REMOVED
aws_secret_access_key = REMOVED

[dev]

(the empty [dev] profile uses the [default] keys)

[profile default]
region = us-west-2
output = json
role_arn = arn:aws:iam::REMOVED:role/CrossAccount
source_profile = prod

[profile prod]
region=us-east-1
output = json

[profile dev]
region = us-west-2
output = json
role_arn = arn:aws:iam::REMOVED:role/CrossAccount
source_profile = prod

My terraform setup uses the dev profile. The above works, however when I would copy the key/secret from the default profile and paste it into the (empty) dev profile, it would no longer assume the role and begin failing with "access denied" errors. Apparently this causes an internal ordering of events to incorrectly pick up the prod user instead of the dev CrossAccount role. Also if I remove the key/secret from [prod], it would fail in the same way.

Terraform version: 0.11.14
Go runtime version: go1.12.4

@mmack-innio
Copy link

Seeing the same issues, please merge 👍

@aeschright
Copy link
Contributor

aeschright commented Oct 2, 2019

Hi @YakDriver! I finally got enough unit test coverage together to go ahead with this PR and merged that to master. I checked the tests against your code and I think there's only one small change because the error type for missing credentials is different. Do you have time this week to rebase and make an update?

@YakDriver
Copy link
Member Author

@aeschright Absolutely. Where do you want an update?

Previously, a bug existed that prevented session-derived creds
from being used when assuming a role. This is because session-
derived creds would not be gathered until the very last moment.
Since all the assumerole logic was passed before this last moment,
assumerole could not work with session-derived creds.

Now, GetCredentials has a new contract - it provides and validates
credentials. Before, GetCredentials would sometimes return
unvalidated creds and sometimes validated creds. This meant that
more error handling logic needed to be included in GetSession and
GetSessionOptions. As part of validating creds, GetCredentials now
gets session-derived creds, if necessary, prior to assuming a role.
@aeschright
Copy link
Contributor

TestAWSGetCredentials_shouldErrorWhenBlank and TestAWSGetCredentials_shouldErrorWithInvalidEndpoint will need to check for the correct error.

@YakDriver
Copy link
Member Author

This is a work in progress. I'm headed out but this is the first shot. I will check it out tomorrow!

@aeschright aeschright self-assigned this Oct 2, 2019
@YakDriver YakDriver force-pushed the tracer branch 2 times, most recently from 9eaaa3d to 1b0db1c Compare October 3, 2019 12:08
@YakDriver
Copy link
Member Author

I had to change those two tests a little but I think they still capture what you're going for. Let me know if you want any other changes.

@aeschright
Copy link
Contributor

Alright, let's do it! I'm going to merge this and test against the provider before I do a release.

@aeschright aeschright merged commit f3803c6 into hashicorp:master Oct 3, 2019
@YakDriver
Copy link
Member Author

Outstanding! 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error getting creds when assuming role and using fallback credentials