-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache optimized routing ("PrefixHash" load balancing - i.e. CHWBL) #333
Conversation
api/v1/model_types.go
Outdated
@@ -144,6 +148,33 @@ type Adapter struct { | |||
URL string `json:"url"` | |||
} | |||
|
|||
type LoadBalancing struct { | |||
Strategy LoadBalancingStrategy `json:"strategy"` | |||
CHWBL CHWBL `json:"chwbl"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, prefer to use the full name instead of abbreviations that most people don't understand. It's fine if it's long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to PrefixHash
since this is a user-facing API and thats probably easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not read all the code but very nice work!
var buf bytes.Buffer | ||
mw := multipart.NewWriter(&buf) | ||
// Keep the same boundary as the initial request (probably not necessary) | ||
mw.SetBoundary(boundary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: handle error. SetBoundary
could fail or cause unexpected behavior
|
||
r.LoadBalancing = model.Spec.LoadBalancing | ||
|
||
if r.LoadBalancing.Strategy == v1.PrefixHashStrategy && r.bodyPayload != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bodyPayload
is always nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will push a fix with tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
type Request struct { | ||
Body []byte | ||
bodyPayload map[string]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is never set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a big problem, good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
) | ||
|
||
type Request struct { | ||
Body []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to avoid buffering but clearly out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always store the body to facilitate retries at the moment. It would probably be nice to be able to disable this.
// endpoint is found with acceptable load. | ||
defaultEndpoint = &ep | ||
} | ||
if chwblLoadOK(ep.inFlight.Load(), g.totalInFlight.Load(), len(g.endpoints), loadFactor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be done before setting the default endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default endpoint is the endpoint that is able to serve the request (has the adapter) but might not meet the load requirement after all other endpoints have been checked.
Adding comment.
} | ||
} | ||
|
||
i++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personal preference: i = (i + 1) % len(g.chwblSortedHashes) // wrap around
right := len(g.chwblSortedHashes) - 1 | ||
for left <= right { | ||
middle := (left + right) / 2 | ||
if g.chwblSortedHashes[middle] == val { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: read g.chwblSortedHashes[middle]
just once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good recommendation, done
return true | ||
} | ||
|
||
avgLoad := float64(totalLoad+1) / float64(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment why +1 here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Replication int `json:"replication,omitempty"` | ||
// PrefixCharLength is the number of characters to count when building the prefix to hash. | ||
// +kubebuilder:validation:Optional | ||
// +kubebuilder:default=100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ignoring the system prompt when using chat completion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
Implementation of proposal: #314
.spec.loadBalancing
field to ModelPrefixHash
(i.e. "Consistent Hashing with Bounded Loads" - CHWBL) load balancing strategyendpoints
package toloadbalancer
modelscaler
package tomodelclient
modelproxy
andmessenger
and intoapiutils
as a shared library34%
improvement in time per generated token using PrefixHash over LeastLoad in specific circumstancesTODO:
PrefixHash
the default strategy in the future if benchmarks look good