Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horizontally scale by Workspace shard label #281

Open
benedictelsom opened this issue Jul 29, 2024 · 1 comment
Open

Horizontally scale by Workspace shard label #281

benedictelsom opened this issue Jul 29, 2024 · 1 comment
Labels
enhancement New feature or request needs:triage

Comments

@benedictelsom
Copy link

benedictelsom commented Jul 29, 2024

What problem are you facing?

Hi Upbound, thanks for your awesome Control Plane and Providers!

My team have found ourselves needing to implement certain resources using provider-terraform to 'fill the gaps' due to crossplane/upjet#346.

Our cluster has multiple tenant teams Claiming on our Compositions which may include one or more workspaces each.

We would like to be able to provide our tenants with consistent performance expectations for reconciliation time but this could be impacted if one of our tenants decides to apply several Claims at once, blocking other tenants reconciliation (if --max-reconcile-rate=1).

I understand we can attempt to increase --max-reconcile-rate as discussed in many other issues however if one tenant applies many Workspaces at once this will still delay other tenants Workspaces.

How could Official Terraform Provider help solve your problem?

provider-terraform could support horizontal scaling by sharding Workspaces on a label, similar to how Flux can horizontally scale.

The implementation looks relatively innocuous, adding a label selector to the client cache options, which could be a potentially simpler approach compared to other suggestions - crossplane/crossplane-runtime#739 and #189.

This approach would fit our use case well as we could shard each tenant to 'guarantee' a certain level of performance, our less important or least friendly tenants could still use a 'shared' provider.

Our provider configuration could then look like this:

# shard 0
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: terraform-config-shard-0
spec:
  deploymentTemplate:
    spec:
      template:
        spec:
          containers:
          - name: package-runtime
            args:
              - "--shard-label-selector=sharding.upbound.io/key=shard0"
---
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-terraform-0
  namespace: kube-tooling
spec:
  package: xpkg.upbound.io/upbound/provider-terraform:v0.17.0
  runtimeConfigRef:
    name: terraform-config-shard-0
  commonLabels:
    app.kubernetes.io/name: provider-terraform
  skipDependencyResolution: true
---
# shard 1
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: terraform-config-shard-1
spec:
  deploymentTemplate:
    spec:
      template:
        spec:
          containers:
          - name: package-runtime
            args:
              - "--shard-label-selector=sharding.upbound.io/key=shard1"
---
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-terraform-1
  namespace: kube-tooling
spec:
  package: xpkg.upbound.io/upbound/provider-terraform:v0.17.0
  runtimeConfigRef:
    name: terraform-config-shard-1
  commonLabels:
    app.kubernetes.io/name: provider-terraform
  skipDependencyResolution: true

And our Compositions would create Workspaces using the shard key for the given tenant and produce something like this:

# from tenant0 Claim, reconciled by shard 0
apiVersion: tf.upbound.io/v1beta1
kind: Workspace
metadata:
  name: xbucket-shard0
  labels:
    sharding.upbound.io/key: shard0
spec:
  forProvider:
    module: |
      resource "aws_s3_bucket" "example" {
        bucket = "my-bucket0"
      }
    source: Inline
  providerConfigRef:
    name: terraform-config
---
# from tenant1 Claim, reconciled by shard 1
apiVersion: tf.upbound.io/v1beta1
kind: Workspace
metadata:
  name: xbucket-shard1
  labels:
    sharding.upbound.io/key: shard1
spec:
  forProvider:
    module: |
      resource "aws_s3_bucket" "example" {
        bucket = "my-bucket1"
      }
    source: Inline
  providerConfigRef:
    name: terraform-config

I am not sure whether this issue is best here in provider-terraform or would be something for crossplane-runtime, from the current implementation it looks like this would be implemented per provider when setting up the manager, although i'm not sure if that code is generated?

@benedictelsom benedictelsom added enhancement New feature or request needs:triage labels Jul 29, 2024
@benedictelsom benedictelsom changed the title A new auth option Horizontally scale by Workspace shard label Jul 29, 2024
@bobh66
Copy link
Collaborator

bobh66 commented Aug 2, 2024

I think this is definitely something that would need to be handled by crossplane-runtime. Similar discussions are in #189 and in crossplane/crossplane-runtime#739

The Flux implementation is interesting but I don't think we would want to expose the resource scheduling to the user. I think a better solution might be to have a webhook that is aware of how many provider instances are running and assigns labels to the incoming resources as they are created to distribute them across the available controllers. There is additional work that would be required to allow multiple instances of a controller to run simultaneously, since the existing locking mechanism only allows for one.

I would raise this issue in crossplane-runtime or add to crossplane/crossplane-runtime#739 and continue the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs:triage
Projects
None yet
Development

No branches or pull requests

2 participants