feat: allow toggling of state refresh #998

Apollorion · 2021-12-03T14:55:30Z

Im part of a large organization that manages github with this provider for everything.

Unfortunately, our organization is so large that the GITHUB_TOKEN used to manage our org gets rate limited by GitHub in just the plan phase of terraform. By the time we get to the actual apply phase (if we even can) we have to wait over an hour for the token to be usable again. Additionally, plans cannot fail fast because of the refresh period (we were seeing up to 45 minute plans sometimes).

This change forces the provider to "trust" the state in terraform if detect_drift = false. This allows us to make changes quickly and keeps requests to the github api low, therefore avoiding rate limiting. It also allows us to turn on the state refresh feature detect_drift = true so we can, every so often, run a plan that will actually detect any drift within our configuration.

jcudit · 2022-01-05T13:44:18Z

I'm unsure if this solution to rate limiting will cause correctness issues elsewhere. I do agree that the provider needs to do a better job of efficiently performing resource refreshes, though. This will be a tough one to ship as-is. I recommend getting more discussion and buy-in from the community as a next step if we are to proceed.

Other workarounds observed in the wild include breaking a state file into multiple using terraform state mv ... as well as targetted plans / applies that only allowlist certain resources to be evaluated and changed.

morremeyer · 2022-02-01T08:45:25Z

Thanks for opening this and thanks @jcudit for sharing your concern.

I am open to this solution as in my organization, we're using this provider to enforce state, no matter the current one. However, given the current performance implications, ensuring that we don't hit API limits with read requests seems a more stable approach to me.

As we're dealing with the same issue right now (rate limiting in the plan phase already) and the speed of execution is not that important to us, I'll contribute a solution to have a configurable read_delay_ms similar to the write_delay_ms that already exists.

I will link the PR here so that you can check it out, @Apollorion.

Apollorion · 2022-02-01T12:38:49Z

I'm unsure if this solution to rate limiting will cause correctness issues elsewhere.

FWIW, our organization has been using this modified provider for a few months and we haven't noticed any issues.

I recommend getting more discussion and buy-in from the community as a next step if we are to proceed.

Yeah, I knew this was going to be a tough sell 😁. Is there a place for us to do this discussion? Here? Discord? Slack?

Other workarounds observed in the wild include breaking a state file into multiple using terraform state mv ... as well as targetted plans / applies that only allowlist certain resources to be evaluated and changed.

We had considered doing this.

The problem with breaking up the state file was it felt technically wrong to us. The issue arises when working strictly within the same few resources. For example, we have hundreds of repos. So to break the state apart we would have 2 workspaces managing the same resources, repos. Then questions come up of, does this go into repo workspace 1 or repo workspace 2? When in reality it made way more sense to just keep all the resources together in the same workspace.

The problem with targeting is that we are using terraform cloud, so we cant target our resources in an easy automatable way with our current terraform solution.

morremeyer · 2022-02-01T12:48:47Z

@Apollorion I'm working on #1054 in the meantime which I think is going to be an easier sell. I'm testing it against our infra right now and will mark the PR ready for review once I'm happy.

Apollorion · 2022-02-01T12:55:33Z

@Apollorion I'm working on #1054 in the meantime which I think is going to be an easier sell. I'm testing it against our infra right now and will mark the PR ready for review once I'm happy.

This isn't a great solution for us, but I have no doubt it is useful to others. Only because we want to fail fast. With our current IaC, with 0 delay, it takes nearly right up to an hour for the plan phase. I cant imagine what our plan time would look like if we added a delay into every read.

This proposed solution is more fitting to us because it basically skips state refresh which allows the plan phase to complete in seconds. Then we can just fail during the few write operations if needed.

morremeyer · 2022-02-01T12:58:37Z

Good to know, thanks. I'm definitely looking forward to more discussion around this.

I would definitely love to have it as a configurable option as I know I want to enforce the state in the module no matter the current state.

Apollorion · 2022-02-04T02:22:13Z

Good to know, thanks. I'm definitely looking forward to more discussion around this.

I would definitely love to have it as a configurable option as I know I want to enforce the state in the module no matter the current state.

Honestly I think this PR is very complementary to your PR. We would like to run nightly automated state refreshes with this change. This way, during the day, our developers can fail fast with the added benefit that, at night, we can verify any drift is caught.

In some automated fashion, we will flip the detect_drift flag this PR provides back to true and use your timeout flag to run a long lived state refresh / apply.

jcudit · 2022-02-11T01:08:14Z

thanks everyone. this is the kind of feedback i was hoping for when hesitating before.

seems safe to ship with the right documentation. 👍🏾

Apollorion · 2022-02-11T18:49:24Z

@jcudit just added documentation for the flag to website/docs/index.html.markdown let me know if you need anything else to get this approved / merged.

Apollorion · 2022-03-02T14:16:05Z

You need anything else from me @jcudit ?

Apollorion · 2022-03-18T17:17:05Z

Sorry to keep bugging you @jcudit I know youre probably a busy person. Anything we can do to move this forward?

anthonyangel · 2022-05-04T07:34:40Z

FWIW independent of the provider you can set Terraform to skip the refresh on plan / apply via https://www.terraform.io/cloud-docs/run/modes-and-options#skipping-automatic-state-refresh - this can also be set in TFC by setting an env var TF_CLI_ARGS_plan="-refresh=false"

Apollorion · 2022-05-04T13:04:11Z

FWIW independent of the provider you can set Terraform to skip the refresh on plan / apply via https://www.terraform.io/cloud-docs/run/modes-and-options#skipping-automatic-state-refresh - this can also be set in TFC by setting an env var TF_CLI_ARGS_plan="-refresh=false"

Oh, wow, I did not know this. Thank you!

Apollorion · 2022-05-05T21:25:44Z

Going to close this PR, since we can just use the TF_CLI_ARGS env variable to do basically the same thing.

morremeyer mentioned this pull request Feb 1, 2022

feat: add configurable read_delay_ms #1054

Merged

feat: allow toggling of state refresh

e5f0aba

Apollorion closed this May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow toggling of state refresh #998

feat: allow toggling of state refresh #998

Apollorion commented Dec 3, 2021

jcudit commented Jan 5, 2022

morremeyer commented Feb 1, 2022

Apollorion commented Feb 1, 2022

morremeyer commented Feb 1, 2022

Apollorion commented Feb 1, 2022

morremeyer commented Feb 1, 2022

Apollorion commented Feb 4, 2022

jcudit commented Feb 11, 2022 •

edited

Loading

Apollorion commented Feb 11, 2022

Apollorion commented Mar 2, 2022

Apollorion commented Mar 18, 2022

anthonyangel commented May 4, 2022

Apollorion commented May 4, 2022

Apollorion commented May 5, 2022

feat: allow toggling of state refresh #998

feat: allow toggling of state refresh #998

Conversation

Apollorion commented Dec 3, 2021

jcudit commented Jan 5, 2022

morremeyer commented Feb 1, 2022

Apollorion commented Feb 1, 2022

morremeyer commented Feb 1, 2022

Apollorion commented Feb 1, 2022

morremeyer commented Feb 1, 2022

Apollorion commented Feb 4, 2022

jcudit commented Feb 11, 2022 • edited Loading

Apollorion commented Feb 11, 2022

Apollorion commented Mar 2, 2022

Apollorion commented Mar 18, 2022

anthonyangel commented May 4, 2022

Apollorion commented May 4, 2022

Apollorion commented May 5, 2022

jcudit commented Feb 11, 2022 •

edited

Loading