Loadbalance based on running requests #16

klauspost · 2020-04-15T10:01:36Z

Current loadbalancing is purely round-robin.

However different requests create different loads which means that servers processing complex requests may be slower and some servers may be mostly idle.

As an alternative simply choosing between the servers with the fewest running requests.

warp uses alternative host selection with the following scheme:

Select the host with the fewest running requests.
If tied, select the host that has the longest time since last request finished.

This both gives a good distribution and will take individual server load into consideration.

harshavardhana · 2020-05-06T19:56:59Z

Current loadbalancing is purely round-robin.

the load balancing requirement for sidekick was to be always purely random - that was the original intention with no heuristics-based requirement.

aweisser · 2021-02-10T08:50:13Z

It should be up to the targets health-path to decide if more load can be put on the target or if it's to busy.
Isn't this the reason why minio/health/ready endpoint is a good choice for Sidekick?

klauspost · 2021-02-10T12:17:17Z

@harshavardhana That assumes that all requests are equal and all servers behave the same, and at least the first statement is always false, the second can be.

aweisser · 2021-02-10T14:38:17Z

@klauspost Your approach assumes that there‘s only a single Sidekick instance, which is also not always true. Look at the Splunk use case for example.
Imho only the Minio server can decide if it can take any more with respect to its internal heuristics.

klauspost · 2021-02-10T15:10:43Z

@aweisser I may be overlooking something, but how does multiple instances affect this?

If all sidekicks are trying to keep the number of running requests equal for all servers that would be good load balancing in my book and not just "load distribution".

aweisser · 2021-02-10T16:46:30Z

A sidekick instance can only work on heuristics that it can measure. Without getting heuristics from the S3 server and without sharing heuristics with other Sidekick instances, a local Sidekick process can only count its local requests without knowing what other requesting clients do.

As you said, not all requests are the same and only the S3 server instance know about the real load.

Imho it's all about the smartness of the health check. Maybe the minio/health/ready endpoint can be even smarter than counting its Go routines before responding with an HTTP 503 "I'm to busy, go away!" It may take the servers system load (RAM, CPU usage) or the saturation of its NICs into account.

This way a "naiv" (let's better call it "simple and bullet proof") round robin over "ready" S3 targets should do the job quite well.
Together with smart health checks it becomes a qualitative load balancing and not just a load distribution.

klauspost · 2021-02-18T17:11:11Z

@aweisser So you are saying because it doesn't know anything more than up/down, we should stick to an algorithm that keeps piling requests on to an overloaded/subpar performing server? Doesn't make of sense to me.

The number of requests is a perfectly valid balancing function. Instead of relying on collecting metrics which may or may not give an indication of load (you mention some, but they are no real indication of load), keeping track of active requests is completely passive and doesn't have to rely any metrics and also takes sidekick->minio network issues into account.

aweisser · 2021-02-22T12:20:28Z

I'm sure you can find examples that speak for one or the other approach because there is no single source of truth in a distributed system and the number of requests is not the only metric that refers to "load".

Also just recognized that the /minio/health/ready probe is currently not counting Go routines anyway.

I was confused by the following gist https://gist.github.com/nitisht/0c11d8c670f565b58d930b526ba0f2ed that states, that the readiness probe returns HTTP 503 if more than 500 go routines are open.

Maybe you already had reasons at Minio to not do it this way or to change the readiness probe to be equivalent to the liveness probe ("always return HTTP 200 as long as the service is running").

My opinion still is, that a server side "readiness" check is relevant for a qualitative load balancing (in contrast to a dumb load distribution). Surely a smarter way than round robin from the client side is also nice to have.

Imho the question should be: Is it worth it to break KISS?

harshavardhana · 2023-12-13T00:14:50Z

fixed in #98 and released.

klauspost mentioned this issue Jul 14, 2023

Slower benchmark with Sidekick #89

Closed

jiuker self-assigned this Dec 5, 2023

klauspost mentioned this issue Dec 6, 2023

support least connections #98

Merged

harshavardhana closed this as completed Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loadbalance based on running requests #16

Loadbalance based on running requests #16

klauspost commented Apr 15, 2020

harshavardhana commented May 6, 2020

aweisser commented Feb 10, 2021

klauspost commented Feb 10, 2021

aweisser commented Feb 10, 2021

klauspost commented Feb 10, 2021

aweisser commented Feb 10, 2021 •

edited

Loading

klauspost commented Feb 18, 2021

aweisser commented Feb 22, 2021 •

edited

Loading

harshavardhana commented Dec 13, 2023

Loadbalance based on running requests #16

Loadbalance based on running requests #16

Comments

klauspost commented Apr 15, 2020

harshavardhana commented May 6, 2020

aweisser commented Feb 10, 2021

klauspost commented Feb 10, 2021

aweisser commented Feb 10, 2021

klauspost commented Feb 10, 2021

aweisser commented Feb 10, 2021 • edited Loading

klauspost commented Feb 18, 2021

aweisser commented Feb 22, 2021 • edited Loading

harshavardhana commented Dec 13, 2023

aweisser commented Feb 10, 2021 •

edited

Loading

aweisser commented Feb 22, 2021 •

edited

Loading