Allow the `Type` override to effect downstream types #1550

iwahbe · 2023-12-01T00:25:35Z

This PR changes how tfbridge.SchemaInfo.Type is interpreted when applied to object properties.

Previously, type changed the schema.TypeSpec.{Type,Ref} fields on the generated resource. It did not effect any nested types.

With this PR, Type is interpreted as a command to rename the relevant object and its entire nested structure. This allows multiple fields or objects to share a type definition when specified, and will allow us to reduce the size of our SDKs.

This is technically a breaking change. Bridge users who have specified Type: someNewType on object types will see different behavior. In practice, setting Type on an object would generate an invalid schema without a careful fix-up. I suspect that no-one does this.

I have tested this change on pulumi-ns1, pulumi-aws, pulumi-gcp and pulumi-azure. None of these providers are effected by this change.

Implementers Notes

paths.TypePath is used to determine the module of derived types. Since we want subtypes of a moved type to themselves move to the module of their parent, we need to track that in the paths used. To allow inserting a type at an arbitrary module, it was necessary to create a new paths.TypePath; paths.RawTypePath projects a tokens.Type into paths.TypePath so it can be used for module discovery.

codecov · 2023-12-01T00:30:52Z

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (6385710) 58.19% compared to head (a0f1639) 58.27%.

Files	Patch %	Lines
pkg/tfgen/generate_schema.go	86.88%	6 Missing and 2 partials ⚠️
pkg/tfgen/renames.go	84.61%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1550      +/-   ##
==========================================
+ Coverage   58.19%   58.27%   +0.07%     
==========================================
  Files         288      288              
  Lines       40040    40096      +56     
==========================================
+ Hits        23301    23365      +64     
+ Misses      15403    15380      -23     
- Partials     1336     1351      +15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

It looks like type merger was broken at the equality (.name) comparison, but since all nested objects had a blank name at the time of comparison, this was ok. It was necessary to fix this to allow equal types to compare equal.

t0yv0 · 2023-12-04T17:23:39Z

I like the spirit of this change very much. To paraphrase what I am seeing, it looks like we allow
the user to trigger deliberate collisions when allocating Pulumi type tokens to nested object paths.
These collisions be used to improved sharing, instead of allocating 2 pulumi tokens (and TypeScript
classes and so forth), the user can now have one.

The chief question I have is this: does this work for pseudo-recursion scenario such as wafv2 types? TLDR is that TF model does not support observable sharing recursion so they encode recursive types with "up to depth N" with say N=5, this causes type explosion and bloat at many levels. This PR relies on pre-written .equals() function to decide structural equality of "deliberately-colliding types". For the Wafv2 scenario will it work? Naively I'd think that genWafType(4) != genWafType(5) deliberately and this scenario will hit the case where the bridge things the user is making a mistake assigning an identical token to these supplementary types.

I also have a few nits on the implementation level that make it harder for me to follow the change, which I'd love to talk through, as perhaps they point to a gap in my understanding how this codebase works. Let me add some comments.

t0yv0 · 2023-12-04T17:32:55Z

pkg/tfgen/generate_schema.go

-	token := fmt.Sprintf("%s/%s:%s", mod.String(), name, name)
-
-	g.renamesBuilder.registerNamedObjectType(typInfo.typePaths, tokens.Type(token))
+	token := tokens.Type(fmt.Sprintf("%s/%s:%s", mod, name, name))


IMPL nit: naively this function is the key function that I'd expected to change in this PR. It freely decides where to put a TF type (in conjunction with the helper modulePlacementFor*). The placement is based on module and name. I'd expect we are not changing the type name. In that case we are down to:

modulePlacementForType

The way this worked before the change is looking at prefixes of paths.TypePath. So when allocating for TypePath = "res_r1.prop_1.element.foo_bar" it would recur upward "res_r1.prop_1.element", "res1_r1.prop_1", "res_r1" until it realized it's in the context of allocating a type for resource "res_r1" and pick the appropriate module based on that.

Presumably this traversal could stop early at "res_r1.prop_1" if prop_1 was given an explicit token Type by the user, and decide to allocate the module according to the user's Type token? I guess what I am saying is why does this necessitate a new clause in TypePath when this information could be looked up from map[string]*SchemaInfo and existing TypePath as written?

naively this function is the key function that I'd expected to change in this PR. It freely decides where to put a TF type (in conjunction with the helper modulePlacementFor*). The placement is based on module and name. I'd expect we are not changing the type name.

We are changing both the module and the name. Consider the following scenario:

resource res_r1: prop: sub: str: string

With SchemaInfo{Type: "res:custom:Obj"} applied to prop.

Direct name change:
The schema object generated for prop needs to have its name changed from R1Prop to Obj and its module changed from index to custom.

Nested name change:
The schema object generated for sub needs to have its name changed from R1PropSub to ObjSub and its module changed from index to custom.

The way this worked before the change is looking at prefixes of paths.TypePath. So when allocating for TypePath = "res_r1.prop_1.element.foo_bar" it would recur upward "res_r1.prop_1.element", "res1_r1.prop_1", "res_r1" until it realized it's in the context of allocating a type for resource "res_r1" and pick the appropriate module based on that.

The way I conceptualize this, when we allocate a type path, we see a path from a root anchor to a property. Root anchors determine the stem of the type name and the module. Previously, a root object was a resource, datasource or config property. My change adds a new root object: RawTypePath, which also determines the name stem and module for types nested below.

Presumably this traversal could stop early at "res_r1.prop_1" if prop_1 was given an explicit token Type by the user, and decide to allocate the module according to the user's Type token?

In effect, this is what happens. We just encode the stop within the path itself, instead of a look aside map.

I guess what I am saying is why does this necessitate a new clause in TypePath when this information could be looked up from map[string]*SchemaInfo and existing TypePath as written?

We need more information than is contained in map[string]*SchemaInfo, we need the full path of the object.

Ah this is a great clarification that names like ObjSub get affected as well, thank you! I missed that detail.

I'm still not seeing what you'd be missing from map[string]*SchemaInfo. Slightly hand-wavy purely functional definition:

func computeName(infos map[string]SchemaInfo, path) Name { schemaInfo := at(infos, path) if schemaInfo.Type != "" { return schemaInfo.Type.Name() } if isEmptyPath(path) { return "" } return computeName(infos, parentPath(path)) + capitalize(path.LastFragment) } at(infos, "res1.prop") == &SchemaInfo{Type: "res:custom:Obj"} computeName(infos, "res1.prop.sub") = computeName(infos, "res1.prop") + "Sub" = "Obj" + "Sub" = "ObjSub"

Compute module placement likewise.

You don't need to cache at(infos, "res1.prop") inside a new clause of TypePath seemingly because it's still there alright in the SchemaInfo tree, addressable by TypePath.

It's inefficient, yes (but that can be fixed up with recursion patterns), but what I find super appealing about factoring things out into pure functions if we can is that it makes it extremely obvious that it's independent of the traversal order through the schema tree and the order in which effects are executed against maps etc.

As things stand with the code I'm getting a slight suspicion that I'm still missing something and there is indeed some importance in the order of traversal and so forth.

This is a litlte bit in the code golf territory though so I don't want to block on it, apologies. I don't see any obvious problems where current code is incorrect.

t0yv0 · 2023-12-04T17:34:43Z

pkg/tfgen/generate_schema.go

@@ -247,13 +284,33 @@ func (g *schemaGenerator) genPackageSpec(pack *pkg) (pschema.PackageSpec, error)
 	spec.Attribution = fmt.Sprintf(attributionFormatString, g.info.Name, g.info.GetGitHubOrg(), g.info.GetGitHubHost())

 	var config []*variable
+	declaredTypes := map[string]*schemaNestedType{}


I like this deconfliction check! QQ. what if there's conflicts via some "other" keys in spec.Types than the ones generated for schemaNestedTypes or this is a non-sequitur (they're one and the same)? Thanks!

At this point in the generation process, they are one and the same. Other types might be added later via ProviderInfo.ExtraTypes.

t0yv0 · 2023-12-04T17:38:26Z

pkg/tfgen/generate_schema.go

 	}
 }

+// Insert a type path redirect, setting src to point to dst.
+func (nt *schemaNestedTypes) correctPath(src, dst paths.TypePath) {
+	nt.pathCorrections[src.UniqueKey()] = dst


IMPL nit: I'm losing track a little bit of the lifetime of this table and the need to track it. Is this to power through failing renames functionality? Does it not overlap with info already stored in paths.NewRawPath , e.g. could we get by with not using it, or else not using paths.NewRawPath, or indeed not using either that'd make me the most happy.

The rename builder logic expects that it can derive the name of an object type from its structural path. This is true up to these corrections, so we need to store a mapping from the old (fully structural) name to the new (user supplied & arbitrary) path.

When the rename builder walks a resource: res_r1.prop1.nested1, it needs to build the associated property path. It builds [resource: res_r1][property: prop1][property: nested1]. If prop1 was renamed, we need to recover the correction during the rename traversal so we store {[resource: res_r1][property: prop1]: [raw: userRename]} in nt.pathCorrections.

This is then used in

pulumi-terraform-bridge/pkg/tfgen/renames.go

Line 241 in a0f1639

func (r renamesBuilder) correctPath(path paths.TypePath) paths.TypePath {

.

See comment above. The name sems to be a function from TypePath and top-level map[string]SchemaInfo but this may be a good form of caching. No matter, I'm leaning toward removing renames from the codebase altogether and if we make that happen we can probably drop this bit as well.

iwahbe · 2023-12-07T06:59:33Z

it looks like we allow the user to trigger deliberate collisions when allocating Pulumi type tokens to nested object paths. These collisions be used to improved sharing, instead of allocating 2 pulumi tokens (and TypeScript classes and so forth), the user can now have one.

Exactly!

The chief question I have is this: does this work for pseudo-recursion scenario such as wafv2 types?

No, and it isn't designed to. This PR gets us closer to handling the recursive case, but doesn't apply yet.

Naively I'd think that genWafType(4) != genWafType(5) deliberately and this scenario will hit the case where the bridge things the user is making a mistake assigning an identical token to these supplementary types.

That is correct.

This feature will be immediately useful for providers such as pulumi-akamai, where many structs share the same complex field (GetPropertyRulesBuilderRulesV20230920Args).

t0yv0 · 2023-12-08T02:40:18Z

No, and it isn't designed to. This PR gets us closer to handling the recursive case, but doesn't apply yet.

Ah, I'm excited about that. We need there is a refined equals() and some kind of marker to make it kick in. Guess work for later.

Reverts #1550 Discovered that ExtraTypes support is broken by this change per #1626

iwahbe added 2 commits November 30, 2023 17:02

Remove type prefixing when derived from type overrides

3daafe7

Error when declaring conflicting types

48dc0fd

iwahbe force-pushed the iwahbe/introduce-named-types branch from 2544a8a to 48dc0fd Compare December 1, 2023 01:02

iwahbe self-assigned this Dec 1, 2023

iwahbe added 4 commits November 30, 2023 17:45

WIP Add test

caca3d2

Progress towards correct type tokens

ccf288d

Correct type tokens

36c657f

Fix type merger

a0f1639

It looks like type merger was broken at the equality (.name) comparison, but since all nested objects had a blank name at the time of comparison, this was ok. It was necessary to fix this to allow equal types to compare equal.

iwahbe force-pushed the iwahbe/introduce-named-types branch from c0e7d2a to a0f1639 Compare December 1, 2023 19:58

iwahbe requested review from t0yv0, VenelinMartinov and guineveresaenger December 1, 2023 20:05

iwahbe mentioned this pull request Dec 1, 2023

Ensure shorter tokens for object types #1118

Open

t0yv0 reviewed Dec 4, 2023

View reviewed changes

iwahbe requested a review from t0yv0 December 7, 2023 07:47

t0yv0 approved these changes Dec 8, 2023

View reviewed changes

iwahbe merged commit 0bd135b into master Dec 12, 2023
9 checks passed

iwahbe deleted the iwahbe/introduce-named-types branch December 12, 2023 09:16

This was referenced Jan 18, 2024

Talos provider fails to build on latest bridge with type conflict #1626

Closed

Revert "Allow the Type override to effect downstream types" #1628

Merged

t0yv0 added a commit that referenced this pull request Jan 18, 2024

Revert "Allow the Type override to effect downstream types" (#1628)

06f511c

Reverts #1550 Discovered that ExtraTypes support is broken by this change per #1626

iwahbe mentioned this pull request Jan 23, 2024

Regress 1626 #1627

Merged

iwahbe restored the iwahbe/introduce-named-types branch January 25, 2024 23:41

iwahbe mentioned this pull request Jan 25, 2024

Introduce named types #1649

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow the `Type` override to effect downstream types #1550

Allow the `Type` override to effect downstream types #1550

iwahbe commented Dec 1, 2023 •

edited

Loading

codecov bot commented Dec 1, 2023 •

edited

Loading

t0yv0 commented Dec 4, 2023

t0yv0 Dec 4, 2023

iwahbe Dec 7, 2023

t0yv0 Dec 8, 2023

t0yv0 Dec 4, 2023

iwahbe Dec 7, 2023

t0yv0 Dec 4, 2023

iwahbe Dec 7, 2023

t0yv0 Dec 8, 2023

iwahbe commented Dec 7, 2023

t0yv0 commented Dec 8, 2023

Allow the Type override to effect downstream types #1550

Allow the Type override to effect downstream types #1550

Conversation

iwahbe commented Dec 1, 2023 • edited Loading

Implementers Notes

codecov bot commented Dec 1, 2023 • edited Loading

Codecov Report

t0yv0 commented Dec 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iwahbe commented Dec 7, 2023

t0yv0 commented Dec 8, 2023

Allow the `Type` override to effect downstream types #1550

Allow the `Type` override to effect downstream types #1550

iwahbe commented Dec 1, 2023 •

edited

Loading

codecov bot commented Dec 1, 2023 •

edited

Loading