Run as k8s jobs rather than as a single running pod. #189

JacobWeyer · 2023-08-30T15:51:09Z

What problem are you facing?

I'd like to see this operator function in a way where it would spin up workspace runs in parallel for every request (up to a max parallel limit that can be set by the user).

The intention is to use this for developer environments, load testing, integration testing and more in a very dynamic manner. The way the current operator seems to work is sequential by nature and ends up being slower as a result.

How could Official Terraform Provider help solve your problem?

By being able to run our terraform we can take advantage of crossplane's flexibility combined with helm and spin up a significant number of micro services and environments very quickly if they can run in parallel rather than being forced to wait for sequential execution. Sequential execution is a real bummer when we have something like RDS or DMS that can take up to 15 minutes to start up properly.

bobh66 · 2023-08-30T15:57:42Z

@JacobWeyer the default for the provider is to run one reconciliation at a time, but this is configurable using --max-reconcile-rate argument in a ControllerConfig and you can set it as high as you want. See https://github.com/upbound/provider-terraform/blob/main/examples/install.yaml for an example ControllerConfig.

However, since the underlying code runs the terraform CLI ,the pod will attempt to use as many CPUs as it has threads configured, so to get a "true" parallel execution you would need to make sure that there are the same number of CPUs as you set for the reconcile rate.

JacobWeyer · 2023-08-30T20:01:51Z

Will that require us to keep a massive reservation at all times rather than allowing this to be somewhat dynamic and to autoscale?

bobh66 · 2023-08-30T20:05:39Z

I'm not sure what you mean by autoscale - the pod will try to use whatever CPUs it needs, if they are available. There is no way to add more pods to the deployment, since Kubernetes controllers can only run a single instance at a time. So your worker node would need to have the CPUs available for the pod to use, but when they aren't in use they would be available for other pods on the worker to use. You might be able to use something like Karpenter to auto scale your nodegroup when a worker runs out of CPUs, and then scale in when the load is reduced.

JacobWeyer · 2023-08-30T20:22:15Z

I guess I'm confused why this was designed to run as a single instance instead of having the operator trigger each run as its own job in a similar manner to how something like Github Actions works.

bobh66 · 2023-08-30T20:42:14Z

Crossplane providers are designed to be reconciling kubernetes controllers which are responsible for maintaining the state specified in the spec field of the resource manifest. That is a different paradigm than a job dispatcher.

If each CLI command was dispatched as individual jobs they could take advantage of idle CPU resources on other workers but each would still require 1 CPU to run to completion, and it would add complexity to track the remote job completion so that subsequent reconciliations don't run while there is already a process running.

balu-ce · 2023-09-05T06:21:59Z

@bobh66 / @JacobWeyer can we do it in a master slave , where master has configuration . and making slave as replicas which is scalable. will it work ?.

JacobWeyer · 2023-09-07T13:33:16Z

Yeah that makes sense @bobh66, I'm still curious if there's a more distributed batching methodology that'd be beneficial. Especially at scale without just running more jobs in parallel on a single operator.

negz · 2024-02-12T21:48:31Z

I'm a little wary of this idea. Mostly in that I'm wary of provider-terraform diverging from how all the other Crossplane providers work.

JacobWeyer added enhancement New feature or request needs:triage labels Aug 30, 2023

JacobWeyer changed the title ~~Run as jobs rather than as a single entity~~ Run as k8s jobs rather than as a single running pod. Aug 30, 2023

toastwaffle mentioned this issue Feb 9, 2024

Workspace resource is stuck with external-create-pending annotation if provider-terraform pod is deleted during Create crossplane/crossplane-runtime#340

Open

bobh66 mentioned this issue Jun 23, 2024

Horizontally Scalable providers crossplane/crossplane-runtime#739

Open

benedictelsom mentioned this issue Jul 29, 2024

Horizontally scale by Workspace shard label #281

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run as k8s jobs rather than as a single running pod. #189

Run as k8s jobs rather than as a single running pod. #189

JacobWeyer commented Aug 30, 2023

bobh66 commented Aug 30, 2023

JacobWeyer commented Aug 30, 2023

bobh66 commented Aug 30, 2023

JacobWeyer commented Aug 30, 2023

bobh66 commented Aug 30, 2023

balu-ce commented Sep 5, 2023

JacobWeyer commented Sep 7, 2023

negz commented Feb 12, 2024

Run as k8s jobs rather than as a single running pod. #189

Run as k8s jobs rather than as a single running pod. #189

Comments

JacobWeyer commented Aug 30, 2023

What problem are you facing?

How could Official Terraform Provider help solve your problem?

bobh66 commented Aug 30, 2023

JacobWeyer commented Aug 30, 2023

bobh66 commented Aug 30, 2023

JacobWeyer commented Aug 30, 2023

bobh66 commented Aug 30, 2023

balu-ce commented Sep 5, 2023

JacobWeyer commented Sep 7, 2023

negz commented Feb 12, 2024