-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the Type
override to effect downstream types
#1550
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1550 +/- ##
==========================================
+ Coverage 58.19% 58.27% +0.07%
==========================================
Files 288 288
Lines 40040 40096 +56
==========================================
+ Hits 23301 23365 +64
+ Misses 15403 15380 -23
- Partials 1336 1351 +15 ☔ View full report in Codecov by Sentry. |
2544a8a
to
48dc0fd
Compare
It looks like type merger was broken at the equality (.name) comparison, but since all nested objects had a blank name at the time of comparison, this was ok. It was necessary to fix this to allow equal types to compare equal.
c0e7d2a
to
a0f1639
Compare
I like the spirit of this change very much. To paraphrase what I am seeing, it looks like we allow The chief question I have is this: does this work for pseudo-recursion scenario such as wafv2 types? TLDR is that TF model does not support observable sharing recursion so they encode recursive types with "up to depth N" with say N=5, this causes type explosion and bloat at many levels. This PR relies on pre-written I also have a few nits on the implementation level that make it harder for me to follow the change, which I'd love to talk through, as perhaps they point to a gap in my understanding how this codebase works. Let me add some comments. |
token := fmt.Sprintf("%s/%s:%s", mod.String(), name, name) | ||
|
||
g.renamesBuilder.registerNamedObjectType(typInfo.typePaths, tokens.Type(token)) | ||
token := tokens.Type(fmt.Sprintf("%s/%s:%s", mod, name, name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMPL nit: naively this function is the key function that I'd expected to change in this PR. It freely decides where to put a TF type (in conjunction with the helper modulePlacementFor*). The placement is based on module and name. I'd expect we are not changing the type name. In that case we are down to:
modulePlacementForType
The way this worked before the change is looking at prefixes of paths.TypePath. So when allocating for TypePath = "res_r1.prop_1.element.foo_bar" it would recur upward "res_r1.prop_1.element", "res1_r1.prop_1", "res_r1" until it realized it's in the context of allocating a type for resource "res_r1" and pick the appropriate module based on that.
Presumably this traversal could stop early at "res_r1.prop_1" if prop_1 was given an explicit token Type by the user, and decide to allocate the module according to the user's Type token? I guess what I am saying is why does this necessitate a new clause in TypePath when this information could be looked up from map[string]*SchemaInfo and existing TypePath as written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naively this function is the key function that I'd expected to change in this PR. It freely decides where to put a TF type (in conjunction with the helper modulePlacementFor*). The placement is based on module and name. I'd expect we are not changing the type name.
We are changing both the module and the name. Consider the following scenario:
resource res_r1:
prop:
sub:
str: string
With SchemaInfo{Type: "res:custom:Obj"}
applied to prop
.
Direct name change:
The schema object generated for prop
needs to have its name changed from R1Prop
to Obj
and its module changed from index
to custom
.
Nested name change:
The schema object generated for sub
needs to have its name changed from R1PropSub
to ObjSub
and its module changed from index
to custom
.
The way this worked before the change is looking at prefixes of paths.TypePath. So when allocating for TypePath = "res_r1.prop_1.element.foo_bar" it would recur upward "res_r1.prop_1.element", "res1_r1.prop_1", "res_r1" until it realized it's in the context of allocating a type for resource "res_r1" and pick the appropriate module based on that.
The way I conceptualize this, when we allocate a type path, we see a path from a root anchor to a property. Root anchors determine the stem of the type name and the module. Previously, a root object was a resource, datasource or config property. My change adds a new root object: RawTypePath
, which also determines the name stem and module for types nested below.
Presumably this traversal could stop early at "res_r1.prop_1" if prop_1 was given an explicit token Type by the user, and decide to allocate the module according to the user's Type token?
In effect, this is what happens. We just encode the stop within the path itself, instead of a look aside map.
I guess what I am saying is why does this necessitate a new clause in TypePath when this information could be looked up from map[string]*SchemaInfo and existing TypePath as written?
We need more information than is contained in map[string]*SchemaInfo
, we need the full path of the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is a great clarification that names like ObjSub get affected as well, thank you! I missed that detail.
I'm still not seeing what you'd be missing from map[string]*SchemaInfo
. Slightly hand-wavy purely functional definition:
func computeName(infos map[string]SchemaInfo, path) Name {
schemaInfo := at(infos, path)
if schemaInfo.Type != "" {
return schemaInfo.Type.Name()
}
if isEmptyPath(path) {
return ""
}
return computeName(infos, parentPath(path)) + capitalize(path.LastFragment)
}
at(infos, "res1.prop") == &SchemaInfo{Type: "res:custom:Obj"}
computeName(infos, "res1.prop.sub") =
computeName(infos, "res1.prop") + "Sub" =
"Obj" + "Sub" =
"ObjSub"
Compute module placement likewise.
You don't need to cache at(infos, "res1.prop")
inside a new clause of TypePath seemingly because it's still there alright in the SchemaInfo tree, addressable by TypePath.
It's inefficient, yes (but that can be fixed up with recursion patterns), but what I find super appealing about factoring things out into pure functions if we can is that it makes it extremely obvious that it's independent of the traversal order through the schema tree and the order in which effects are executed against maps etc.
As things stand with the code I'm getting a slight suspicion that I'm still missing something and there is indeed some importance in the order of traversal and so forth.
This is a litlte bit in the code golf territory though so I don't want to block on it, apologies. I don't see any obvious problems where current code is incorrect.
@@ -247,13 +284,33 @@ func (g *schemaGenerator) genPackageSpec(pack *pkg) (pschema.PackageSpec, error) | |||
spec.Attribution = fmt.Sprintf(attributionFormatString, g.info.Name, g.info.GetGitHubOrg(), g.info.GetGitHubHost()) | |||
|
|||
var config []*variable | |||
declaredTypes := map[string]*schemaNestedType{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this deconfliction check! QQ. what if there's conflicts via some "other" keys in spec.Types than the ones generated for schemaNestedTypes or this is a non-sequitur (they're one and the same)? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point in the generation process, they are one and the same. Other types might be added later via ProviderInfo.ExtraTypes
.
} | ||
} | ||
|
||
// Insert a type path redirect, setting src to point to dst. | ||
func (nt *schemaNestedTypes) correctPath(src, dst paths.TypePath) { | ||
nt.pathCorrections[src.UniqueKey()] = dst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMPL nit: I'm losing track a little bit of the lifetime of this table and the need to track it. Is this to power through failing renames functionality? Does it not overlap with info already stored in paths.NewRawPath , e.g. could we get by with not using it, or else not using paths.NewRawPath, or indeed not using either that'd make me the most happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rename builder logic expects that it can derive the name of an object type from its structural path. This is true up to these corrections, so we need to store a mapping from the old (fully structural) name to the new (user supplied & arbitrary) path.
When the rename builder walks a resource: res_r1.prop1.nested1
, it needs to build the associated property path. It builds [resource: res_r1][property: prop1][property: nested1]
. If prop1
was renamed, we need to recover the correction during the rename traversal so we store {[resource: res_r1][property: prop1]: [raw: userRename]}
in nt.pathCorrections
.
This is then used in
pulumi-terraform-bridge/pkg/tfgen/renames.go
Line 241 in a0f1639
func (r renamesBuilder) correctPath(path paths.TypePath) paths.TypePath { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above. The name sems to be a function from TypePath and top-level map[string]SchemaInfo
but this may be a good form of caching. No matter, I'm leaning toward removing renames from the codebase altogether and if we make that happen we can probably drop this bit as well.
Exactly!
No, and it isn't designed to. This PR gets us closer to handling the recursive case, but doesn't apply yet.
That is correct. This feature will be immediately useful for providers such as pulumi-akamai, where many structs share the same complex field ( |
Ah, I'm excited about that. We need there is a refined equals() and some kind of marker to make it kick in. Guess work for later. |
This PR changes how
tfbridge.SchemaInfo.Type
is interpreted when applied to object properties.Previously, type changed the
schema.TypeSpec.{Type,Ref}
fields on the generated resource. It did not effect any nested types.With this PR,
Type
is interpreted as a command to rename the relevant object and its entire nested structure. This allows multiple fields or objects to share a type definition when specified, and will allow us to reduce the size of our SDKs.This is technically a breaking change. Bridge users who have specified
Type: someNewType
on object types will see different behavior. In practice, settingType
on an object would generate an invalid schema without a careful fix-up. I suspect that no-one does this.I have tested this change on
pulumi-ns1
,pulumi-aws
,pulumi-gcp
andpulumi-azure
. None of these providers are effected by this change.Implementers Notes
paths.TypePath
is used to determine the module of derived types. Since we want subtypes of a moved type to themselves move to the module of their parent, we need to track that in the paths used. To allow inserting a type at an arbitrary module, it was necessary to create a newpaths.TypePath
;paths.RawTypePath
projects atokens.Type
intopaths.TypePath
so it can be used for module discovery.