Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the Type override to effect downstream types #1550

Merged
merged 6 commits into from
Dec 12, 2023

Conversation

iwahbe
Copy link
Member

@iwahbe iwahbe commented Dec 1, 2023

This PR changes how tfbridge.SchemaInfo.Type is interpreted when applied to object properties.

Previously, type changed the schema.TypeSpec.{Type,Ref} fields on the generated resource. It did not effect any nested types.

With this PR, Type is interpreted as a command to rename the relevant object and its entire nested structure. This allows multiple fields or objects to share a type definition when specified, and will allow us to reduce the size of our SDKs.


This is technically a breaking change. Bridge users who have specified Type: someNewType on object types will see different behavior. In practice, setting Type on an object would generate an invalid schema without a careful fix-up. I suspect that no-one does this.

I have tested this change on pulumi-ns1, pulumi-aws, pulumi-gcp and pulumi-azure. None of these providers are effected by this change.

Implementers Notes

paths.TypePath is used to determine the module of derived types. Since we want subtypes of a moved type to themselves move to the module of their parent, we need to track that in the paths used. To allow inserting a type at an arbitrary module, it was necessary to create a new paths.TypePath; paths.RawTypePath projects a tokens.Type into paths.TypePath so it can be used for module discovery.

Copy link

codecov bot commented Dec 1, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (6385710) 58.19% compared to head (a0f1639) 58.27%.

Files Patch % Lines
pkg/tfgen/generate_schema.go 86.88% 6 Missing and 2 partials ⚠️
pkg/tfgen/renames.go 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1550      +/-   ##
==========================================
+ Coverage   58.19%   58.27%   +0.07%     
==========================================
  Files         288      288              
  Lines       40040    40096      +56     
==========================================
+ Hits        23301    23365      +64     
+ Misses      15403    15380      -23     
- Partials     1336     1351      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@iwahbe iwahbe force-pushed the iwahbe/introduce-named-types branch from 2544a8a to 48dc0fd Compare December 1, 2023 01:02
@iwahbe iwahbe self-assigned this Dec 1, 2023
It looks like type merger was broken at the equality (.name) comparison, but since all
nested objects had a blank name at the time of comparison, this was ok. It was necessary
to fix this to allow equal types to compare equal.
@t0yv0
Copy link
Member

t0yv0 commented Dec 4, 2023

I like the spirit of this change very much. To paraphrase what I am seeing, it looks like we allow
the user to trigger deliberate collisions when allocating Pulumi type tokens to nested object paths.
These collisions be used to improved sharing, instead of allocating 2 pulumi tokens (and TypeScript
classes and so forth), the user can now have one.

The chief question I have is this: does this work for pseudo-recursion scenario such as wafv2 types? TLDR is that TF model does not support observable sharing recursion so they encode recursive types with "up to depth N" with say N=5, this causes type explosion and bloat at many levels. This PR relies on pre-written .equals() function to decide structural equality of "deliberately-colliding types". For the Wafv2 scenario will it work? Naively I'd think that genWafType(4) != genWafType(5) deliberately and this scenario will hit the case where the bridge things the user is making a mistake assigning an identical token to these supplementary types.

I also have a few nits on the implementation level that make it harder for me to follow the change, which I'd love to talk through, as perhaps they point to a gap in my understanding how this codebase works. Let me add some comments.

token := fmt.Sprintf("%s/%s:%s", mod.String(), name, name)

g.renamesBuilder.registerNamedObjectType(typInfo.typePaths, tokens.Type(token))
token := tokens.Type(fmt.Sprintf("%s/%s:%s", mod, name, name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPL nit: naively this function is the key function that I'd expected to change in this PR. It freely decides where to put a TF type (in conjunction with the helper modulePlacementFor*). The placement is based on module and name. I'd expect we are not changing the type name. In that case we are down to:

modulePlacementForType

The way this worked before the change is looking at prefixes of paths.TypePath. So when allocating for TypePath = "res_r1.prop_1.element.foo_bar" it would recur upward "res_r1.prop_1.element", "res1_r1.prop_1", "res_r1" until it realized it's in the context of allocating a type for resource "res_r1" and pick the appropriate module based on that.

Presumably this traversal could stop early at "res_r1.prop_1" if prop_1 was given an explicit token Type by the user, and decide to allocate the module according to the user's Type token? I guess what I am saying is why does this necessitate a new clause in TypePath when this information could be looked up from map[string]*SchemaInfo and existing TypePath as written?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naively this function is the key function that I'd expected to change in this PR. It freely decides where to put a TF type (in conjunction with the helper modulePlacementFor*). The placement is based on module and name. I'd expect we are not changing the type name.

We are changing both the module and the name. Consider the following scenario:

resource res_r1:
  prop:
    sub:
     str: string

With SchemaInfo{Type: "res:custom:Obj"} applied to prop.

Direct name change:
The schema object generated for prop needs to have its name changed from R1Prop to Obj and its module changed from index to custom.

Nested name change:
The schema object generated for sub needs to have its name changed from R1PropSub to ObjSub and its module changed from index to custom.

The way this worked before the change is looking at prefixes of paths.TypePath. So when allocating for TypePath = "res_r1.prop_1.element.foo_bar" it would recur upward "res_r1.prop_1.element", "res1_r1.prop_1", "res_r1" until it realized it's in the context of allocating a type for resource "res_r1" and pick the appropriate module based on that.

The way I conceptualize this, when we allocate a type path, we see a path from a root anchor to a property. Root anchors determine the stem of the type name and the module. Previously, a root object was a resource, datasource or config property. My change adds a new root object: RawTypePath, which also determines the name stem and module for types nested below.

Presumably this traversal could stop early at "res_r1.prop_1" if prop_1 was given an explicit token Type by the user, and decide to allocate the module according to the user's Type token?

In effect, this is what happens. We just encode the stop within the path itself, instead of a look aside map.

I guess what I am saying is why does this necessitate a new clause in TypePath when this information could be looked up from map[string]*SchemaInfo and existing TypePath as written?

We need more information than is contained in map[string]*SchemaInfo, we need the full path of the object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is a great clarification that names like ObjSub get affected as well, thank you! I missed that detail.

I'm still not seeing what you'd be missing from map[string]*SchemaInfo. Slightly hand-wavy purely functional definition:

func computeName(infos map[string]SchemaInfo, path) Name {
  schemaInfo := at(infos, path)
  if schemaInfo.Type != "" {
     return schemaInfo.Type.Name()
  }
  if isEmptyPath(path) {
     return ""
  }
  return computeName(infos, parentPath(path)) + capitalize(path.LastFragment)
}

at(infos, "res1.prop") == &SchemaInfo{Type: "res:custom:Obj"}

computeName(infos, "res1.prop.sub") =
computeName(infos, "res1.prop") + "Sub" =
"Obj" + "Sub" =
"ObjSub"

Compute module placement likewise.

You don't need to cache at(infos, "res1.prop") inside a new clause of TypePath seemingly because it's still there alright in the SchemaInfo tree, addressable by TypePath.

It's inefficient, yes (but that can be fixed up with recursion patterns), but what I find super appealing about factoring things out into pure functions if we can is that it makes it extremely obvious that it's independent of the traversal order through the schema tree and the order in which effects are executed against maps etc.

As things stand with the code I'm getting a slight suspicion that I'm still missing something and there is indeed some importance in the order of traversal and so forth.

This is a litlte bit in the code golf territory though so I don't want to block on it, apologies. I don't see any obvious problems where current code is incorrect.

@@ -247,13 +284,33 @@ func (g *schemaGenerator) genPackageSpec(pack *pkg) (pschema.PackageSpec, error)
spec.Attribution = fmt.Sprintf(attributionFormatString, g.info.Name, g.info.GetGitHubOrg(), g.info.GetGitHubHost())

var config []*variable
declaredTypes := map[string]*schemaNestedType{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this deconfliction check! QQ. what if there's conflicts via some "other" keys in spec.Types than the ones generated for schemaNestedTypes or this is a non-sequitur (they're one and the same)? Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point in the generation process, they are one and the same. Other types might be added later via ProviderInfo.ExtraTypes.

}
}

// Insert a type path redirect, setting src to point to dst.
func (nt *schemaNestedTypes) correctPath(src, dst paths.TypePath) {
nt.pathCorrections[src.UniqueKey()] = dst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPL nit: I'm losing track a little bit of the lifetime of this table and the need to track it. Is this to power through failing renames functionality? Does it not overlap with info already stored in paths.NewRawPath , e.g. could we get by with not using it, or else not using paths.NewRawPath, or indeed not using either that'd make me the most happy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rename builder logic expects that it can derive the name of an object type from its structural path. This is true up to these corrections, so we need to store a mapping from the old (fully structural) name to the new (user supplied & arbitrary) path.

When the rename builder walks a resource: res_r1.prop1.nested1, it needs to build the associated property path. It builds [resource: res_r1][property: prop1][property: nested1]. If prop1 was renamed, we need to recover the correction during the rename traversal so we store {[resource: res_r1][property: prop1]: [raw: userRename]} in nt.pathCorrections.

This is then used in

func (r renamesBuilder) correctPath(path paths.TypePath) paths.TypePath {
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above. The name sems to be a function from TypePath and top-level map[string]SchemaInfo but this may be a good form of caching. No matter, I'm leaning toward removing renames from the codebase altogether and if we make that happen we can probably drop this bit as well.

@iwahbe
Copy link
Member Author

iwahbe commented Dec 7, 2023

it looks like we allow the user to trigger deliberate collisions when allocating Pulumi type tokens to nested object paths. These collisions be used to improved sharing, instead of allocating 2 pulumi tokens (and TypeScript classes and so forth), the user can now have one.

Exactly!

The chief question I have is this: does this work for pseudo-recursion scenario such as wafv2 types?

No, and it isn't designed to. This PR gets us closer to handling the recursive case, but doesn't apply yet.

Naively I'd think that genWafType(4) != genWafType(5) deliberately and this scenario will hit the case where the bridge things the user is making a mistake assigning an identical token to these supplementary types.

That is correct.


This feature will be immediately useful for providers such as pulumi-akamai, where many structs share the same complex field (GetPropertyRulesBuilderRulesV20230920Args).

@iwahbe iwahbe requested a review from t0yv0 December 7, 2023 07:47
@t0yv0
Copy link
Member

t0yv0 commented Dec 8, 2023

No, and it isn't designed to. This PR gets us closer to handling the recursive case, but doesn't apply yet.

Ah, I'm excited about that. We need there is a refined equals() and some kind of marker to make it kick in. Guess work for later.

@iwahbe iwahbe merged commit 0bd135b into master Dec 12, 2023
9 checks passed
@iwahbe iwahbe deleted the iwahbe/introduce-named-types branch December 12, 2023 09:16
t0yv0 added a commit that referenced this pull request Jan 18, 2024
Reverts #1550

Discovered that ExtraTypes support is broken by this change per
#1626
@iwahbe iwahbe mentioned this pull request Jan 23, 2024
@iwahbe iwahbe restored the iwahbe/introduce-named-types branch January 25, 2024 23:41
@iwahbe iwahbe mentioned this pull request Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants