Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we really need strong types for all attribute types? #695

Closed
eladb opened this issue Sep 12, 2018 · 6 comments
Closed

Do we really need strong types for all attribute types? #695

eladb opened this issue Sep 12, 2018 · 6 comments
Assignees

Comments

@eladb
Copy link
Contributor

eladb commented Sep 12, 2018

At the moment, we have a strong type for every runtime attribute for each type of AWS resource (sqs.QueueName, sqs.QueueArn, sns.TopicArn).

As part of #168, we did a little analysis of the primitive types represented by these attributes (based on the spec snapshot from July 8, 2018):

  • 171 attributes
  • 158 primitive types (157 'String', 1 'Json')
  • 13 list types, all with 'String' element types

Why did we introduce strong types?

The original motivation for generating attribute types was to allow strong-typing when referencing resources in the L2 AWS Construct Library (the L1 CloudFormation Resource properties currently just uses Tokens (or Strings) as properties, so the strong-typing doesn't really help there).

The other motivation was to (poorly) hide the fact that the values in these attributes were not concrete values but rather tokens that evaluated to CloudFormation intrinsic functions upon synthesis (and only resolved during deployment). Attribute types are all derived from cdk.Token.

What changed?

However, since we originally introduce these types, two things have changed:

  1. We devised the XxxRef pattern, which means that the AWS Construct Library surface area normally uses XxxRef (i.e. BucketRef) as oppose to a specific attribute type (i.e. BucketArn). The pattern allows either passing in a resource defined within the current stack or an "imported" resource. This is a superior approach since it actually models the relationship between the resources and not between specific resource attributes.

  2. We introduced the ability to naturally treat tokens as normal strings, which means that we can now encapsulate the fact that these attributes are evaluated to CloudFormation intrinsics inside a normal string.

What's the problem?

Although there is nothing "wrong" in defining a special type for all resource attributes, I don't think this really brings a lot of value to our framework, and actually introduces unneeded friction in the ergonomics of the AWS Construct Library.

Here are some examples:

  1. It is not possible to pass in a primitive value in cases where an attribute type is required. One would need to wrap it with the attribute type: new QueueName("MyQueueName").
  2. In cases where a string is accepted and users want to pass in an attribute type, they need to .toString it: const queueName: string = queue.queueName.toString().
  3. Since TypeScript uses structural-typing, the type-checker will treat all attribute types as the same type (because they adhere to exactly the same interface). This means that the compiler will not identify cases where we assign different types: foo: QueueArn = new QueueUrl("Foo"). This breaks in strongly-typed languages like Java and .NET and at the moment very hard to discover in the jsii compilation. We could introduce dummy properties to each type to break structural typing, but this will be hard to enforce for non-generated attribute types (and we have some of those).

Proposed solution

Stop generating a type for each resource attribute and use the primitive type (i.e. string or string[]) for all L1 resource attribute properties (and anywhere in the AWS Construct Library). The value returned will be a stringified token that will resolve to a CFN intrinsic.

Since some attributes are string[], we need a way to represent tokens as string arrays ("string-arrayification of tokens"). Shouldn't be too difficult to extend #168 to allow tokens to be encoded into string arrays.

This will allow us to simplify treat all resource attributes as primitive types everywhere across the AWS Construct Library and will remove the friction described above.

What will we lose if we drop strong-types?

TODO: We should be able to reflect on the API and analyze where are we actually using strong types as inputs (not as outputs).

As mentioned above, the L1 input layer (resource properties) did not benefit from strong-types (since the CFN spec doesn't have the required information at the moment), so we are not losing anything at that layer.

The L2 input layer (L2 construct props) are using XxxRef types anyway, so there is no negative impact at that layer.

We will lose the Arn type which provides some strong-typing around the IAM library (addResource accepts an Arn).

Resource/construct attributes will now return string or string[].

Summary

Over-modeling is a common issue in object-oriented design, and a classic tradeoff. Strong-typing resource attributes is similar to defining a type for each attribute of an Employee class, e.g:

class Employ {
  firstName: FirstName;
  lastName: LastName;
  phoneNumber: PhoneNumber;
}

Where FirstName, LastName and PhoneNumber are wrappers of string. As much as this is "correct" in the modeling sense (and in some cases might be even useful to encapsulate logic such as validation rules), in most cases, this is considered an anti-pattern as it increases the complexity of the type-system without adding sufficient value.

@rix0rrr
Copy link
Contributor

rix0rrr commented Sep 13, 2018

Apart from the snark in the final paragraph which I feel is unnecessary, I think this change is overall probably a win for usability. I have a couple of concerns, which are mostly already mirrored in the original ticket body:

  • Do we have a good enough view of what the internal CloudFormation type system looks like, for things that can be referenced by intrinsics? For example, we're now asserting that "basically everything is a string or a list of strings". We also confidently asserted before that "everything is a string", and that was wrong (because turns out there were also lists). So how sure are we this time that we have the full view of the types? For example, the DBCluster has an attribute that looks like a number. Is it a real number? Is a number in string format also fine? Can there be arrays of these? Are there more such types that are not strings? Who knows.

  • I will be sad to see Arn go. As already mentioned, especially around the IAM/policy libraries the Arn type gives some grip to users trying to puzzle their infra code together. Although the places where we were expecting ARNs used to be typed as any, so I guess even string is a step up from that :). We can always (breakingly) rename the method to addResourceArn() to be more clear.

  • I will give it to you on a platter that people who are confused about the phases of execution are going to take the bucket.bucketName: string attribute and try to read it, and then be confused about it not containing the actual value they're looking for. And deep in the bowels of CDK code with everything typed as a string, it's going to be hard to know which ones are real strings and which ones are merely stringified intrinsics. Not everybody can keep a full mental model of the entire CDK surface in their head like you, and definitely not occasional users. Most people will not work on CDK code full time: they'll work on their application code, and occasionally have need to dip into infrastructure code. This is exactly where a type system shines and should be able to give guardrails. Having objects instead of strings removes this expectation. Now granted, we probably don't need a type per attribute, a single class to represent "some string value that you can't see the contents of" would already do.

@eladb
Copy link
Contributor Author

eladb commented Sep 13, 2018

The last paragraph was not intended to be provocative, but rather try to project our discussion onto similar situations in object-oriented design from other domains. To me, this analogy helped realize that we did took it "one step too far" with our modeling.

Regarding the data accuracy from CloudFormation. The information about return types was derived from the CloudFormation spec at July 8. The CFN spec explicitly defines the primitive type of attributes, so if a return type is used that is neither a string nor a string[], we can always just return cdk.CloudFormationToken.

The Arn type: I think we will still support Arn.format and Arn.parse but those will just work with strings. Names of arguments and properties are part of the API surface, so I think addResource(arn: string) encodes the information that the first parameter is an ARN (similarly to queue.queueArn).

You are absolutely right - people will interact with stringified tokens - they will try to print/debug/parse those strings and will find something that they initially didn't expect. However, we intentionally formatted stringified tokens to aesthetically look like parameter expansions (which they are!). A string like "${replace-me}" tells a programmer "this thing is going to be replaced with a string value", which is exactly what we want to signal. It doesn't matter when, right? Some of those expansions are going to happen during synthesis and some during deployment. At the point in time this value is printed, it's still not resolved. I feel this approach beautifully balances out usability and type-safety quite well.

I agree with you that most people will not spend most of their time in CDK code. I believe the majority of our users (remains to be seen of course) will never even notice that these strings have some magic in them. They will just naturally use them as strings (or string arrays) and it will Just Work™️. These users (which I claim are the majority) will not need to know anything about tokens, which means their cognitive load is reduced because they have to worry about less types and conversions, etc. For the savvy, our affordances to parameter expansion/substitution (which we might be able to even improve further) should provide the right signals as to what's going on under the hood. And obviously we will need to provide some documentation and helper functions such as isToken(s) for the deep divers and people who want to be smart about handling those strings.

All in all, I think simplifying our type-system (dramatically) is very much inline with out tenet to optimize for the common use case.

@rix0rrr
Copy link
Contributor

rix0rrr commented Sep 14, 2018

If we ever have need of representing Tokens in numbers, let's not forget that NaNs have unused bits in which we can store additional data.

@rix0rrr
Copy link
Contributor

rix0rrr commented Sep 17, 2018

So, I think what we're settling on is representing magic values in the native type system of the language. That is:

  • Every value that represents a string is typed as a string, but some strings are special and represent "lazy" strings. We will have to teach people to not introspect strings.
  • (Discutable, see below) Every value that represents a number is typed as a number, but some numbers are special and represent "lazy" numbers. We can carve out some parts of the giant state space of a double for this, in any of a number of ways.
  • Every value that represents a list of strings is typed as a string[]. Some values in this domain are special and represent lazy lists (I'm thinking something like ["#{TOKEN[blabla]}"] in a way that distinguishes the magic list from a normal list with a single magic string in it).

Numbers

I know the CFN spec doesn't declare any attributes as numbers: they're all strings. However, some attributes morally represent numbers anyway. Notably, the ports on an RDS instance. In security group setup, I do want to pass in the ports that come out of an RDS instance into the port ranges expected by a security group. If we were to type dbCluster.port: string, then the argument of a securitygroup port setting should also be a string, otherwise how are we going to pass in the DBCluster port? But that reduces usability for literally everyone else who doesn't use RDS, because now they're going to have to convert their integer port numbers to strings.

Lists

The last one seems risky to me, because it will be VERY tempting for people to do something like:

if (azs.length < 2) {
    throw new Error('Must supply at least 2 AZs');
}

Where they should be writing this instead:

if (!containsToken(azs) && azs.length < 2) {
  throw new Error('...');
}

Which is not great. Go and explain that.

Another open question, are we going to allow array concatenation on these magic arrays? It will be hard to prevent it, and the natural mirror of that in string land is string concatenation, which we emphatically DO support. Therefore, for symmetry we probably should.

Conditionals

We also don't really have a way to represent lazy booleans. We don't really support (or have thought about) {Fn::If}s at all, and I think the argument we're currently making is: "you should never have to do that because CDK makes you do everything eagerly in code". Okay fine, I suppose we can make that argument.

@eladb
Copy link
Contributor Author

eladb commented Sep 17, 2018

#712 initially only removes attributes that are strings. Let's see how this looks and feels and whether we actually need to add support for other types.

@rix0rrr
Copy link
Contributor

rix0rrr commented Sep 17, 2018

We know we need support for list of strings anyway, so let's not ignore that.

@eladb eladb self-assigned this Sep 17, 2018
@eladb eladb added the design label Sep 17, 2018
@eladb eladb closed this as completed in 6508f78 Sep 19, 2018
eladb pushed a commit that referenced this issue Sep 20, 2018
__NOTICE__: This release includes a framework-wide [__breaking
change__](#712) which changes the type
of all the string resource attributes across the framework. Instead of using
strong-types that extend `cdk.Token` (such as `QueueArn`, `TopicName`, etc), we
now represent all these attributes as normal `string`s, and codify the tokens
into the string (using the feature introduced in [#168](#168)).

Furthermore, the `cdk.Arn` type has been removed. In order to format/parse ARNs,
use the static methods on `cdk.ArnUtils`.

See motivation and discussion in [#695](#695).

* **cfn2ts:** use stringified tokens for resource attributes instead of strong types ([#712](#712)) ([6508f78](6508f78)), closes [#518](#518) [#695](#695) [#744](#744)
* **aws-dynamodb:** Attribute type for keys, changes the signature of the `addPartitionKey` and `addSortKey` methods to be consistent across the board. ([#720](#720)) ([e6cc189](e6cc189))
* **aws-codebuild:** fix typo "priviledged" -> "privileged

* **assets:** cab't use multiple assets in the same stack ([#725](#725)) ([bba2e5b](bba2e5b)), closes [#706](#706)
* **aws-codebuild:** typo in BuildEnvironment "priviledged" -> "privileged     ([#734](#734)) ([72fec36](72fec36))
* **aws-ecr:** fix addToResourcePolicy ([#737](#737)) ([eadbda5](eadbda5))
* **aws-events:** ruleName can now be specified ([#726](#726)) ([a7bc5ee](a7bc5ee)), closes [#708](#708)
* **aws-lambda:** jsii use no long requires 'sourceAccount' ([#728](#728)) ([9e7d311](9e7d311)), closes [#714](#714)
* **aws-s3:** remove `policy` argument ([#730](#730)) ([a79190c](a79190c)), closes [#672](#672)
* **cdk:** "cdk init" java template is broken ([#732](#732)) ([281c083](281c083)), closes [#711](#711) [aws/jsii#233](aws/jsii#233)

* **aws-apigateway:** new API Gateway Construct Library ([#665](#665)) ([b0f3857](b0f3857))
* **aws-cdk:** detect presence of EC2 credentials ([#724](#724)) ([8e8c295](8e8c295)), closes [#702](#702) [#130](#130)
* **aws-codepipeline:** make the Stage insertion API in CodePipeline more flexible ([#460](#460)) ([d182818](d182818))
* **aws-codepipeline:** new "Pipeline#addStage" convenience method ([#647](#647)) ([25c9fa0](25c9fa0))
* **aws-rds:** add support for parameter groups ([#729](#729)) ([2541508](2541508)), closes [#719](#719)
* **docs:** add documentation for CDK toolkit plugings ([#733](#733)) ([965b918](965b918))
* **dependencies:** upgrade to [jsii 0.7.6](https://github.com/awslabs/jsii/releases/tag/v0.7.6)
eladb pushed a commit that referenced this issue Sep 20, 2018
* v0.9.2

__NOTICE__: This release includes a framework-wide [__breaking
change__](#712) which changes the type
of all the string resource attributes across the framework. Instead of using
strong-types that extend `cdk.Token` (such as `QueueArn`, `TopicName`, etc), we
now represent all these attributes as normal `string`s, and codify the tokens
into the string (using the feature introduced in [#168](#168)).

Furthermore, the `cdk.Arn` type has been removed. In order to format/parse ARNs,
use the static methods on `cdk.ArnUtils`.

See motivation and discussion in [#695](#695).

* **cfn2ts:** use stringified tokens for resource attributes instead of strong types ([#712](#712)) ([6508f78](6508f78)), closes [#518](#518) [#695](#695) [#744](#744)
* **aws-dynamodb:** Attribute type for keys, changes the signature of the `addPartitionKey` and `addSortKey` methods to be consistent across the board. ([#720](#720)) ([e6cc189](e6cc189))
* **aws-codebuild:** fix typo "priviledged" -> "privileged

* **assets:** cab't use multiple assets in the same stack ([#725](#725)) ([bba2e5b](bba2e5b)), closes [#706](#706)
* **aws-codebuild:** typo in BuildEnvironment "priviledged" -> "privileged     ([#734](#734)) ([72fec36](72fec36))
* **aws-ecr:** fix addToResourcePolicy ([#737](#737)) ([eadbda5](eadbda5))
* **aws-events:** ruleName can now be specified ([#726](#726)) ([a7bc5ee](a7bc5ee)), closes [#708](#708)
* **aws-lambda:** jsii use no long requires 'sourceAccount' ([#728](#728)) ([9e7d311](9e7d311)), closes [#714](#714)
* **aws-s3:** remove `policy` argument ([#730](#730)) ([a79190c](a79190c)), closes [#672](#672)
* **cdk:** "cdk init" java template is broken ([#732](#732)) ([281c083](281c083)), closes [#711](#711) [aws/jsii#233](aws/jsii#233)

* **aws-apigateway:** new API Gateway Construct Library ([#665](#665)) ([b0f3857](b0f3857))
* **aws-cdk:** detect presence of EC2 credentials ([#724](#724)) ([8e8c295](8e8c295)), closes [#702](#702) [#130](#130)
* **aws-codepipeline:** make the Stage insertion API in CodePipeline more flexible ([#460](#460)) ([d182818](d182818))
* **aws-codepipeline:** new "Pipeline#addStage" convenience method ([#647](#647)) ([25c9fa0](25c9fa0))
* **aws-rds:** add support for parameter groups ([#729](#729)) ([2541508](2541508)), closes [#719](#719)
* **docs:** add documentation for CDK toolkit plugings ([#733](#733)) ([965b918](965b918))
* **dependencies:** upgrade to [jsii 0.7.6](https://github.com/awslabs/jsii/releases/tag/v0.7.6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants