Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table-driven diff tests #1282

Closed
1 of 8 tasks
t0yv0 opened this issue Jul 7, 2023 · 6 comments
Closed
1 of 8 tasks

Table-driven diff tests #1282

t0yv0 opened this issue Jul 7, 2023 · 6 comments
Assignees
Labels
kind/engineering Work that is not visible to an external user resolution/fixed This issue was fixed
Milestone

Comments

@t0yv0
Copy link
Member

t0yv0 commented Jul 7, 2023

Motivated by a growing number of issues labelled bug/diff in both this repository and related bridged provider repositories, this is a sketch of a table-driven approach to saturate test coverage of how bridged providers perform resource planning and diffs.

Sample issues:

Start by observing that in absence of special flags such as ignoreChanges, the correct diff behavior for a bridged provider should match the behavior of the upstream TF provider under TF CLI. Specifically, they need to agree on:

  • whether the plan is to update, replace or leave a resource unchanged
  • the new suggested values of the resource, possibly unknown
  • which values are passed into hooks such as CustomizeDiff

To explore and tabulate the behavior of TF CLI, create a purpose-built TF test plugin P and exercise it on sample values. This will result in a dataset characterizing the expected behavior. Then, turn this dataset into a test suite that asserts that P behaves in the same way when bridged under Pulumi CLI, modulo translation.

An important consideration is what input combinations should the TF behavior be sampled on. Since TF CLI modifies diffs in ProposedNew as well as does some null/unknown post-processing in the providers itself, it is important to cover these dimensions:

  • every possible combination of schema, up to depth 3, including maps, sets and objects
  • within the schema, every possible combination of Attribute, Block, and nesting modes (SingleNested, ListNested, etc)
  • every possible combination of attribute kind, Computed, Optional, Computed+Optional, Required
  • sets, including several distinct custom set hashes
  • objects need to have up to 2 attributes so Set[Obj] combination cover variations of attribute kind
  • all combinations of (priorState?, config), covering both creates and updates
  • when generating values, ensure to cover null, unknown, and empty values and 2 regular values, say for strings [null, unknown, "", "a", "b" ]
  • every resource should have enough (priorState?, config) tuples to cover update, replace, unchanged

A dataset of this sort may generate millions of test cases, but can be exercised quickly in-memory without gRPC networking overhead.

The current expectation is that the first iteration of this work will find/confirm discrepancies between expected and actual behavior and inform further work on rule-based test case exclusions and/or product fixes; but when this test suite is complete and passing, it will give the team a lot of confidence to make changes to the diff behavior or bridged providers and refactor toward a maintainable simple implementation.

Tasks

  • 3d Built a test TF provider and associated bridged provider that exercises every feature above
  • 1d Tabulate and assert current bridged Pulumi diff behavior on this provider, use it to assert behavior remains as-is, as this is really helpful to make it easier to make changes to the bridge while getting quick feedback on when diff behavior changes, and accepting these changes when the change is intentional (an improvement)
  • 3d Tabulate TF Diff behavior of the provider - this is the desired behavior, analyze discrepancies, compute %
  • 3d Reconcile Set handling discrepancies fixing Regress AWS-2442 #1255
  • 5d Placeholder for reconciling other discrepancies
  • 1d Revisit PULUMI_DIFF_STRATEGY=PlanState flag currently rolled out to AWS, evaluate perf on the test suite
  • 1d Revisit shimv2.CtyInstanceState flag currently fixing aws_wafv2_web_acl issues in AWS, evaluate perf on the test suite
  • 3d Test bug/diff issues in priority order against improved diff logic to see which ones can be closed
@t0yv0
Copy link
Member Author

t0yv0 commented Jan 22, 2024

Since this is a bit out of date let me replace it with tickets that are scoped tightly and doable/needed

@t0yv0 t0yv0 self-assigned this Feb 21, 2024
@t0yv0 t0yv0 added this to the 0.101 milestone Feb 21, 2024
@t0yv0
Copy link
Member Author

t0yv0 commented Mar 11, 2024

Quick update on 2024-03-11. Making some progress to have a harness setup.

WIP PR: #1728

It can take an SDKv2 schema and resource inputs R1 and R2 for that schema and run it through TF and Pulumi to
specifically study the transition from the state that corresponds to R1 to the state that corresponds to R2.

Running tests against an in-memory copy of the bridged provider required some changes that landed in Pulumi per
pulumi/pulumi#15526 but now these changes need to be propagated in pulumi-yaml, released, and
folded back to pulumi. The propagation is done in pulumi/pulumi-yaml#553 now waiting for a round
of releases from the Platform team.

Another design problem is how should cross-tests specify resource inputs to feed into both Pulumi and TF. Current choice
is to use the TF json representation to specify the inputs and translate it to Pulumi. Started looking at extracting
convert module from PF to help with this process #1746 although
other options are possible as well (all with some trade-offs).

For some practical impact, started looking at set handling working backward from
pulumi/pulumi-aws#2442 - the framework helps debug the TF side to establish where canonical
ordering happens, but more work is needed. I have started on some notes
#1748 but there are next steps to confirm experimentally.

Remaining work:

  • good asserts for cross-tests
  • tabulation over schema space

@mjeffryes
Copy link
Member

Is this done now that #1728 has shipped? I assume we probably have more tests to add, but does that belong on this ticket or in a follow up?

@t0yv0
Copy link
Member Author

t0yv0 commented Mar 26, 2024

Skimming over this, the critical bit that is not done is that we don't have full confidence that we match TF behavior over diffs for all combinations - this is the kind of confidence we're really looking for from this effort.

The two most important missing pieces are:

#1790 Rapid generator of SDKv2 schema/value pairs
#1791 Cross-test matching Diff behavior (over rapid-generated schema space)

That is right now we can do point tests but we don't saturate testing the combinations which means there's still an open window for divergent behavior.

This is also still relevant and needs this test effort to reduce risk I think:

#1785

Reading the rest of the ticket, it's not relevant. Building a separate provider binary is not needed as we can now build on-the-fly; shimv2.CtyInstanceState is removed in favor of PlanResourceChange, tabulation is irrelevant if we can explore the same space with rapid.

@t0yv0
Copy link
Member Author

t0yv0 commented Mar 26, 2024

does that belong on this ticket or in a follow up

I'll defer to you and @iwahbe how to best track.

@iwahbe iwahbe added the resolution/fixed This issue was fixed label Mar 28, 2024
@iwahbe
Copy link
Member

iwahbe commented Mar 28, 2024

Table driven diff tests are done, so we can close. I'd like to prioritize #1790 and #1791, since I think those are what justify this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/engineering Work that is not visible to an external user resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests

3 participants