Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run as k8s jobs rather than as a single running pod. #189

Open
JacobWeyer opened this issue Aug 30, 2023 · 8 comments
Open

Run as k8s jobs rather than as a single running pod. #189

JacobWeyer opened this issue Aug 30, 2023 · 8 comments
Labels
enhancement New feature or request needs:triage

Comments

@JacobWeyer
Copy link

What problem are you facing?

I'd like to see this operator function in a way where it would spin up workspace runs in parallel for every request (up to a max parallel limit that can be set by the user).

The intention is to use this for developer environments, load testing, integration testing and more in a very dynamic manner. The way the current operator seems to work is sequential by nature and ends up being slower as a result.

How could Official Terraform Provider help solve your problem?

By being able to run our terraform we can take advantage of crossplane's flexibility combined with helm and spin up a significant number of micro services and environments very quickly if they can run in parallel rather than being forced to wait for sequential execution. Sequential execution is a real bummer when we have something like RDS or DMS that can take up to 15 minutes to start up properly.

@JacobWeyer JacobWeyer added enhancement New feature or request needs:triage labels Aug 30, 2023
@JacobWeyer JacobWeyer changed the title Run as jobs rather than as a single entity Run as k8s jobs rather than as a single running pod. Aug 30, 2023
@bobh66
Copy link
Collaborator

bobh66 commented Aug 30, 2023

@JacobWeyer the default for the provider is to run one reconciliation at a time, but this is configurable using --max-reconcile-rate argument in a ControllerConfig and you can set it as high as you want. See https://github.com/upbound/provider-terraform/blob/main/examples/install.yaml for an example ControllerConfig.

However, since the underlying code runs the terraform CLI ,the pod will attempt to use as many CPUs as it has threads configured, so to get a "true" parallel execution you would need to make sure that there are the same number of CPUs as you set for the reconcile rate.

@JacobWeyer
Copy link
Author

Will that require us to keep a massive reservation at all times rather than allowing this to be somewhat dynamic and to autoscale?

@bobh66
Copy link
Collaborator

bobh66 commented Aug 30, 2023

I'm not sure what you mean by autoscale - the pod will try to use whatever CPUs it needs, if they are available. There is no way to add more pods to the deployment, since Kubernetes controllers can only run a single instance at a time. So your worker node would need to have the CPUs available for the pod to use, but when they aren't in use they would be available for other pods on the worker to use. You might be able to use something like Karpenter to auto scale your nodegroup when a worker runs out of CPUs, and then scale in when the load is reduced.

@JacobWeyer
Copy link
Author

I guess I'm confused why this was designed to run as a single instance instead of having the operator trigger each run as its own job in a similar manner to how something like Github Actions works.

@bobh66
Copy link
Collaborator

bobh66 commented Aug 30, 2023

Crossplane providers are designed to be reconciling kubernetes controllers which are responsible for maintaining the state specified in the spec field of the resource manifest. That is a different paradigm than a job dispatcher.

If each CLI command was dispatched as individual jobs they could take advantage of idle CPU resources on other workers but each would still require 1 CPU to run to completion, and it would add complexity to track the remote job completion so that subsequent reconciliations don't run while there is already a process running.

@balu-ce
Copy link

balu-ce commented Sep 5, 2023

@bobh66 / @JacobWeyer can we do it in a master slave , where master has configuration . and making slave as replicas which is scalable. will it work ?.

@JacobWeyer
Copy link
Author

Yeah that makes sense @bobh66, I'm still curious if there's a more distributed batching methodology that'd be beneficial. Especially at scale without just running more jobs in parallel on a single operator.

@negz
Copy link
Member

negz commented Feb 12, 2024

I'm a little wary of this idea. Mostly in that I'm wary of provider-terraform diverging from how all the other Crossplane providers work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs:triage
Projects
None yet
Development

No branches or pull requests

4 participants