improve load balancing at higher deployment query volume #933

Theodus · 2024-09-05T13:27:18Z

For high-volume deployments, the gateway has a tendency to pick multiple indexers (often 3) to maximize performance. The primary issue with this behavior is that we run a higher risk of unnecessarily overloading indexers. Above some threshold indexer-selection should load-balance requests between indexers that would otherwise all be included in the selected set.

The first challenge is detecting high volume on a deployment. Here's an rough design:

The gateway should track query volume per subgraph deployment. This would likely be a parking_lot::RwLock<HashMap<DeploymentId, AtomicUsize>>. Inserting into the map should be relatively infrequent, and updating an entry only requires a read lock. The atomic counter is incremented by the amount of indexers selected, and decremented once each indexer request completes. A "high volume" state on a deployment is when this counter is above some threshold, meaning there are approximately n outstanding indexer requests happening concurrently.

There are multiple potential approaches for what to do when we detect high volume on a deployment. Here's a list that increase in difficulty, and might be a reasonable order of iterations to go down until we hit "good enough for now":

When the deployment is "high volume", call indexer_selection::select with a limit of 1 instead of 3.
Add a parameter to indexer_selection::select that acts as a cost to including additional indexers in the selected set. This value should increase at higher volume.
Use a proper load-balancing algorithm between the selected indexers, see this for inspiration: https://samwho.dev/load-balancing/.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve load balancing at higher deployment query volume #933

improve load balancing at higher deployment query volume #933

Theodus commented Sep 5, 2024 •

edited

Loading

improve load balancing at higher deployment query volume #933

improve load balancing at higher deployment query volume #933

Comments

Theodus commented Sep 5, 2024 • edited Loading

Theodus commented Sep 5, 2024 •

edited

Loading