Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable autoscaling #203

Merged
merged 17 commits into from
Sep 17, 2024
Merged

Configurable autoscaling #203

merged 17 commits into from
Sep 17, 2024

Conversation

nstogner
Copy link
Contributor

@nstogner nstogner commented Sep 11, 2024

  • Add configurable scale-down-delay, autoscaling interval & window, and target (Fixes Model expose scale down delay #202)
  • Add full scale-up-to-down integration test
  • Update helm chart values
  • Update docs (add autoscaling docs, remove how-to info from concepts and break into separate sections)
  • Remove resource-profile-override fields from Model spec

@nstogner nstogner changed the title WIP: Add basic autoscaling config options Configurable autoscaling with integration tests Sep 12, 2024
@nstogner nstogner changed the title Configurable autoscaling with integration tests Configurable autoscaling Sep 12, 2024
charts/kubeai/values.yaml Outdated Show resolved Hide resolved
api/v1/model_types.go Outdated Show resolved Hide resolved
Copy link
Contributor

@samos123 samos123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except some minor nits and question on behavior.

api/v1/model_types.go Outdated Show resolved Hide resolved
internal/config/system.go Outdated Show resolved Hide resolved
internal/config/system.go Outdated Show resolved Hide resolved
@samos123
Copy link
Contributor

samos123 commented Sep 14, 2024

Alternative:

apiVersion: kubeai.org/v1
kind: Model
metadata:
  name: faster-whisper-medium-en-cpu
spec:
  features: [SpeechToText]
  owner: Systran
  url: hf://Systran/faster-whisper-medium.en
  engine: FasterWhisper
  minReplicas: 0 # defaults to 0 if not set like before
  maxReplicas: 3 # defaults to 3 if not set like before
  concurrentRequests: 100 # defaults to 100 if not set
  ScaleDownDelay: 60s # defaults to 60s if not set
  resourceProfile: cpu:1

Benefits: simple and backwards compatible.

Ideally we keep things backwards compatible also so our existing tutorials and docs don't all have to get updated.

// TargetRequests is the target number of active requests per Pod.
// +kubebuilder:validation:Minimum=1
// +kubebuilder:default=100
TargetRequests int32 `json:"targetRequests"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer calling it concurrentRequest

@nstogner nstogner merged commit 91f1d15 into main Sep 17, 2024
5 checks passed
@nstogner nstogner deleted the autoscaling-config branch September 17, 2024 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Model expose scale down delay
2 participants