Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure shorter tokens for object types #1118

Open
t0yv0 opened this issue May 12, 2023 · 5 comments
Open

Ensure shorter tokens for object types #1118

t0yv0 opened this issue May 12, 2023 · 5 comments
Assignees
Labels
kind/enhancement Improvements or new features size/M Estimated effort to complete (up to 5 days).

Comments

@t0yv0
Copy link
Member

t0yv0 commented May 12, 2023

Hello!

  • Vote on this issue by adding a 👍 reaction
  • If you want to implement this feature, comment to let us know (we'll work with you on design, scheduling, etc.)

Issue details

When mapping deeply nested TF properties to Pulumi Package Schema .types section, tfgen needs to pick "tokens" to identify every record/object type it encounters, and sometimes it picks very long tokens. Unfortunately these tokens are not simply abstract identifiers, but are used by every language codegen (C#, Py, TS, Java, Go) to pick names for classes representing these structures. This can lead to very long class names, because the current naming heuristic makes the name O(N) in the depth of property.

Example of an unusable name:

TemplateDefinitionSheetVisualGeospatialMapVisualChartConfigurationFieldWellsGeospatialMapAggregatedFieldWellsValueNumericalMeasureFieldFormatConfigurationNumericFormatConfigurationPercentageDisplayFormatConfigurationSeparatorConfigurationThousandsSeparator

In languages that require file-per-class this additionally creates problems with very long filenames.

The feature request is to make a change to generate shorter tokens for object types. This is scoped to only types of resource/datasource/provider properties, and will not change anything for types representing resources, datasources or provider config itself. The change will make codegen class names shorter, easier to work with, more pleasant, and avoid hard limits on filenames in languages such as C#.

The last bit of the token must be shorter than 256 as that is where codegen breaks. Arguably we can also have a smaller cutoff like 120 for aesthetic reasons. We can make this configurable defaulting to some reasonable value.

Note that this change will be technically breaking, because it will translate to auxiliary types being moved, and some programs depending on the old location of these types will be broken with the update. To mitigate the break, we need to ensure that an escape hatch is available for the bridged provider maintainer to hand-write a token of a given auxiliary type. It looks like .Type or .NestedType fields on SchemaInfo may help with this purpose but we need to confirm as it's not obvious. The expectation is that the maintainers will chose to mostly accept shorter names as a "welcome breaking change", but make exceptions for auxiliary types that happen to be widely used, based on user feedback.

Affected area/feature

@t0yv0 t0yv0 added needs-triage Needs attention from the triage team kind/enhancement Improvements or new features labels May 12, 2023
@t0yv0 t0yv0 changed the title Ensure shorter tokens for aux types Ensure shorter tokens for object types May 12, 2023
@t0yv0
Copy link
Member Author

t0yv0 commented May 12, 2023

Noting some comments per @danielrbradley

Scoping per resource may be beneficial for gradual rollout instead of switching the default wholesale, something that abstracts ObjectTypeNamingStrategy parameter, keep current strategy as the default but allow to opt-in to shorter names per resource/datasource. For ecosystem team we can also have a way to option to a different global strategy if need that, say on larger providers like OCI.

@t0yv0
Copy link
Member Author

t0yv0 commented May 12, 2023

Note it looks like .Type/.NestedType cannot be relied right now to influence the picked tokens, so that is definitely something that we need to make sure is available here.

@t0yv0 t0yv0 mentioned this issue May 12, 2023
8 tasks
@guineveresaenger guineveresaenger added kind/bug Some behavior is incorrect or out of spec and removed needs-triage Needs attention from the triage team labels May 12, 2023
@t0yv0 t0yv0 added size/S Estimated effort to complete (1-2 days). size/M Estimated effort to complete (up to 5 days). and removed size/S Estimated effort to complete (1-2 days). labels Aug 4, 2023
@t0yv0 t0yv0 removed the kind/bug Some behavior is incorrect or out of spec label Sep 5, 2023
@t0yv0
Copy link
Member Author

t0yv0 commented Sep 6, 2023

@mjeffryes this is the exciting ticket I was thinking of.

@mjeffryes mjeffryes self-assigned this Sep 6, 2023
iwahbe added a commit that referenced this issue Nov 30, 2023
`tfbridge.SchemaInfo` has an `Omit bool` field, which will omit a property from the
schema. Unfortunately, omitted fields don't omit their associated types from the schema,
leading to unnecessary bloat. This leads to code such as this (from aws):

```go
// We removed the `definition` property from quicksights.Template, see
// #1118
// But the types are still present in the schema, which pollutes the Go SDK
// specifically. This function removes those types from the schema.
func removeUnusedQuicksightTypes(pulumiPackageSpec *schema.PackageSpec) {
        var elidedTypes []string
        for tok := range pulumiPackageSpec.Types {
                if strings.Contains(tok, ":quicksight/AnalysisDefinition") ||
                        strings.Contains(tok, ":quicksight/DashboardDefinition") ||
                        strings.Contains(tok, ":quicksight/TemplateDefinition") {
                        elidedTypes = append(elidedTypes, tok)
                }
        }
        for _, tok := range elidedTypes {
                delete(pulumiPackageSpec.Types, tok)
        }
}
```

`Omit: true` is only used in 2 providers: `pulumi-aws` and
`pulumi-alicloud` ([source](https://github.com/search?q=org%3Apulumi%20Omit%3A%20true&type=code)).

Datadog has recenlty added some hefty (+900k lines) types, and I am considering using
`Omit: true` to trim unwanted types. This is a necessary pre-cursor for that usage.
iwahbe added a commit that referenced this issue Nov 30, 2023
`tfbridge.SchemaInfo` has an `Omit bool` field, which will omit a property from the
schema. Unfortunately, omitted fields don't omit their associated types from the schema,
leading to unnecessary bloat. This leads to code such as this (from aws):

```go
// We removed the `definition` property from quicksights.Template, see
// #1118
// But the types are still present in the schema, which pollutes the Go SDK
// specifically. This function removes those types from the schema.
func removeUnusedQuicksightTypes(pulumiPackageSpec *schema.PackageSpec) {
        var elidedTypes []string
        for tok := range pulumiPackageSpec.Types {
                if strings.Contains(tok, ":quicksight/AnalysisDefinition") ||
                        strings.Contains(tok, ":quicksight/DashboardDefinition") ||
                        strings.Contains(tok, ":quicksight/TemplateDefinition") {
                        elidedTypes = append(elidedTypes, tok)
                }
        }
        for _, tok := range elidedTypes {
                delete(pulumiPackageSpec.Types, tok)
        }
}
```

`Omit: true` is only used in 2 providers: `pulumi-aws` and
`pulumi-alicloud` ([source](https://github.com/search?q=org%3Apulumi%20Omit%3A%20true&type=code)).

Datadog has recenlty added some hefty (+900k lines) types, and I am considering using
`Omit: true` to trim unwanted types. This is a necessary pre-cursor for that usage.
iwahbe added a commit that referenced this issue Nov 30, 2023
`tfbridge.SchemaInfo` has an `Omit bool` field, which will omit a
property from the schema. Unfortunately, omitted fields don't omit their
associated types from the schema, leading to unnecessary bloat. This
leads to code such as this (from aws):

```go
// We removed the `definition` property from quicksights.Template, see
// #1118
// But the types are still present in the schema, which pollutes the Go SDK
// specifically. This function removes those types from the schema.
func removeUnusedQuicksightTypes(pulumiPackageSpec *schema.PackageSpec) {
        var elidedTypes []string
        for tok := range pulumiPackageSpec.Types {
                if strings.Contains(tok, ":quicksight/AnalysisDefinition") ||
                        strings.Contains(tok, ":quicksight/DashboardDefinition") ||
                        strings.Contains(tok, ":quicksight/TemplateDefinition") {
                        elidedTypes = append(elidedTypes, tok)
                }
        }
        for _, tok := range elidedTypes {
                delete(pulumiPackageSpec.Types, tok)
        }
}
```

`Omit: true` is only used in 2 providers: `pulumi-aws` and
`pulumi-alicloud`
([source](https://github.com/search?q=org%3Apulumi%20Omit%3A%20true&type=code)).

Datadog has recenlty added some hefty (+900k lines) types, and I am
considering using `Omit: true` to trim unwanted types. This is a
necessary pre-cursor for that usage. It also has the advantage of making
more intuitive sense to future users of the bridge.
@iwahbe
Copy link
Member

iwahbe commented Dec 1, 2023

Merging #1550 will allow users to override type tokens by setting .Type to the old token. If a type changes and they want to change it back, a .Type based override will restore the old name.

@mjeffryes mjeffryes added this to the 0.108 milestone Jul 24, 2024
corymhall added a commit that referenced this issue Aug 8, 2024
- Most of the `schemaNestedTypes` code was just moved from
  `generate_schema.go`
- The new code is the code related to `nestedTypeGraph`

I picked an arbitrary max character limit of 120, but those still look
pretty long. I checked in `pulumi-aws` and there are still a lot of
types that are up to and over 120.

I currently just pick the shorter name if the normal name would be +120,
but I feel like it would be better to introduce a new schema property to
enable this at the resource level.

TODO:
 - I need to add some more tests to ensure this is deterministic, but I
   am not yet sure it is possible to make it completely deterministic
   without tracking state (maybe by reading in the existing schema.json)

re #1118
@mjeffryes mjeffryes removed this from the 0.108 milestone Aug 19, 2024
@t0yv0
Copy link
Member Author

t0yv0 commented Sep 13, 2024

We have some progress related to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Improvements or new features size/M Estimated effort to complete (up to 5 days).
Projects
None yet
Development

No branches or pull requests

5 participants