Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count external API calls #1241

Merged

Conversation

mergenci
Copy link
Collaborator

@mergenci mergenci commented Mar 27, 2024

Description of your changes

This PR introduces three AWS API call counters:

  1. An AWS SDK v2 middleware to count authentication calls,
  2. An AWS SDK v2 middleware to count resource API calls,
  3. An AWS SDK v1 session handler to count resource API calls.

API calls that couldn't be completed because of a connection error are not counted. API calls that return API errors (or no errors) are counted. There are no comprehensive AWS documentation on request rate limits, but here are two resources on the topic, for reference:

  1. Request throttling for the Amazon EC2 API
  2. Managing and monitoring API throttling in your workloads

This PR also removes unsafe pointer operations, as described in upbound/terraform-provider-aws#196.

Alternatives considered

During the design phase, we investigated whether implementing an http.RoundTripper would be a good solution. Ideally, we would have a common implementation for AWS SDK v1 and v2, since both methods use an http.Client under the hood. RoundTripper implementation proved to be infeasible, because of the following reasons:

  1. Plugging in a RoundTripper to the client returned by AWSClient.HTTPClient() worked for AWS SDK v1 calls, but not for AWS SDK v2 calls.
  2. AWS SDK v1 doesn't store service ID (EC2, IAM, etc.) and operation name (DescribeVPCs, etc.) in the request context, like AWS SDK v2 does. Therefore, we wouldn't be able to label v1 calls by service ID and operation name.

Checklist

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

I couldn't run make reviewable, because my local terraform setup is broken.

How has this code been tested

I've tested the code manually using the following resource configuration below, which contains resources that use AWS SDK v1 and v2, as of this writing. Because Upjet comes with Prometheus client, Upjet-based providers serve their metrics at :8080/metrics, by default. Here's a sample excerpt after applying the resource configuration:

# HELP upjet_resource_external_api_calls The number of external API calls.
# TYPE upjet_resource_external_api_calls counter
upjet_resource_external_api_calls{service="EC2",service_operation="AuthorizeSecurityGroupIngress"} 1
upjet_resource_external_api_calls{service="EC2",service_operation="CreateSecurityGroup"} 1
upjet_resource_external_api_calls{service="EC2",service_operation="CreateTags"} 1
upjet_resource_external_api_calls{service="EC2",service_operation="CreateVpc"} 1
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeNetworkAcls"} 3
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeRouteTables"} 3
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeSecurityGroupRules"} 5
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeSecurityGroups"} 11
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeVpcAttribute"} 9
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeVpcs"} 4
upjet_resource_external_api_calls{service="EC2",service_operation="RevokeSecurityGroupEgress"} 2
upjet_resource_external_api_calls{service="STS",service_operation="GetCallerIdentity"} 1

I manually cross-checked reported counts with the calls reported by CloudTrail Event History. Note that CloudTrail Event History may take up to a few minutes to show latest calls.

To test connection errors, I put breakpoints in the code, shut down my Internet connection upon hitting the breakpoint, and then resumed execution. To test API errors, I tried to delete a VPC that has a Security Group configured.

Resource Configuration

apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
  annotations:
    meta.upbound.io/example-id: ec2/v1beta1/securitygroupingressrule
  name: test-pr-1241-vpc
  labels:
    testing.upbound.io/example-name: test-pr-1241-vpc
spec:
  forProvider:
    region: us-west-1
    cidrBlock: 172.16.0.0/16
    tags:
      Name: TestPr1241VPC

---
apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroup
metadata:
  annotations:
    meta.upbound.io/example-id: ec2/v1beta1/securitygroupingressrule
  name: test-pr-1241-securitygroup
  labels:
    testing.upbound.io/example-name: test-pr-1241-securitygroup
spec:
  forProvider:
    region: us-west-1
    vpcIdSelector:
      matchLabels:
        testing.upbound.io/example-name: test-pr-1241-vpc

---
apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroupIngressRule
metadata:
  name: test-pr-1241-securitygroupingressrule
spec:
  forProvider:
    cidrIpv4: 10.0.0.0/8
    fromPort: 8080
    ipProtocol: tcp
    region: us-west-1
    securityGroupIdRef:
      name: test-pr-1241-securitygroup
    toPort: 8081

@mergenci mergenci force-pushed the external-api-call-counter branch 3 times, most recently from a1f9f69 to ddc9008 Compare March 27, 2024 20:12
Copy link
Collaborator

@ulucinar ulucinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @mergenci for working on this important observability topic. Left some comments for you to consider. Also would you like to briefly discuss the wrapped http.RoundTripper approach in the PR description so that we record the challenges associated with that alternative?

internal/clients/aws.go Show resolved Hide resolved
internal/clients/aws.go Show resolved Hide resolved
internal/clients/aws.go Show resolved Hide resolved
internal/clients/aws.go Show resolved Hide resolved
internal/clients/aws.go Show resolved Hide resolved
Signed-off-by: Cem Mergenci <cmergenci@gmail.com>
@mergenci mergenci force-pushed the external-api-call-counter branch from ddc9008 to 28c4ff2 Compare March 28, 2024 13:19
@mergenci mergenci marked this pull request as ready for review March 28, 2024 13:20
Comment on lines +11 to +12
github.com/aws/aws-sdk-go v1.49.2
github.com/aws/aws-sdk-go-v2 v1.24.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bumps are coming from the native provider's upbound fork dependency versions.

@@ -6,12 +6,14 @@ package clients

import (
"context"
"reflect"
"unsafe"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎🎉

Copy link
Collaborator

@ulucinar ulucinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mergenci, lgtm.

@ulucinar
Copy link
Collaborator

/test-examples="examples/iam/v1beta1/role.yaml"

@mergenci mergenci merged commit aaad019 into crossplane-contrib:main Mar 28, 2024
12 checks passed
@mergenci mergenci deleted the external-api-call-counter branch March 28, 2024 14:04
@mbbush mbbush mentioned this pull request Apr 22, 2024
54 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants