Optimising querying for validator delegations #15162

freak12techno · 2023-02-25T17:12:20Z

When querying for validator's delegations, if I am correct, it takes all the delegations from the database, and then only returns those where validator address matches the one queried for (x/staking/keeper/delegation.go). This might make it awfully slow if, for example, there's 10k delegations overall and validator I'm querying for has only 5 delegations.

It concerns me because I have my tools that are periodically querying my delegators count (by asking for 1st delegation and the count of all delegators), and such queries are basically killing the fullnode I'm querying it from.

I can likely make a PR on optimising it, but I'm not sure if we store delegations by validator somewhere (likely not, if not we need to index it separately to work. If yes, we just need to take the validator delegations and use it instead of asking for all delegations).

Do you think it's the right approach (the one I described) or do we have other reasons to do it the way it's done now?

freak12techno · 2023-02-25T17:35:08Z

Another point to be optimised: querying validators delegations with pagination and counting them. If I'm correct, for example if a validator has 1000 delegations and I am querying for first 5 while asking to count all of them, it'll take all of them into the database then return only the first 5, but will use all the entries to calculate the total delegations count. Maybe it's worth it to store delegators count somewhere as a separate key in store and use it for calculation?

freak12techno · 2023-02-26T15:09:32Z

Okay I've researched a bit more, so I was partially right.
Here's how validator delegations are queried:

cosmos-sdk/x/staking/keeper/grpc_query.go

Line 87 in 747c29e

    
           func (k Querier) ValidatorDelegations(c context.Context, req *types.QueryValidatorDelegationsRequest) (*types.QueryValidatorDelegationsResponse, error) {

It'll get the store and use GenericFilteredPaginate on that, which will query for all the records until it finds N records which are matching the predicate (N = limit).
In my case I have 1 delegation on my validator (validator itself), so it'll have to iterate through all the records to check there are no more records to return, as 1 is not enough for a full batch.
Also if there's a validator where there are more delegations than the limit in pagination, it'll have to iterate through all entries to count them.
It may be fixed by 1) storing all delegations not in the same store key, but by validator, so when you need to get validator delegations, you just take the right key and receive only those with needed validator, and 2) maybe storing delegations count per validator somewhere as a separate key, so no need to iterate over all validator entries to get the count, which would make requests with count_total=true faster.

SmartStake · 2023-02-27T04:57:21Z

thanks for raising this item. even otherwise, the delegations API has been gradually slowing down over the last two years. it has gone from near instant response times to high response times to the current stage where for Cosmos and Terra, it just dies and is impractical). The workarounds have somewhat helped but it is still an everyday operational work item for smart stake dashboard needs (even for syncing all delegations just once a day). Hoping to get some short term improvements as outlined on this ticket and more permanent improvements in longer term.

atheeshp · 2023-02-27T13:03:48Z

maybe storing delegations count per validator somewhere as a separate key, so no need to iterate over all validator entries to get the count, which would make requests with count_total=true faster.

I don't think this make the query faster, assume there is only one delegation from a delegator_address and he is querying the state with page_limit 10 and count_total = false (still this query can iterate over all the delegations in state in the worst case, if it is at the end of the state).

The only solution that I can see here is add an extra index which can iterate over validator address (which is already discussed here).

for ex the key can look like:

0x81 | ValidatorAddrLen (1 byte) | ValidatorAddr | DelegatorAddrLen (1 byte) | DelegatorAddr -> (some non nil value)

with the above key we can iterate over validator_address and decode the complete key to get the delegator address, now we have both delegator_address and validator_address, we can simply form the key to get delegation entry from delegation key prefix using these keys.

and also can we close this issue and discuss on the older one, since there is already an issue present for this #10516.

freak12techno · 2023-02-27T13:11:28Z

@atheeshp

I don't think this make the query faster, assume there is only one delegation from a delegator_address and he is querying the state with page_limit 10 and count_total = false (still this query can iterate over all the delegations in state by querying my delegators list with limit=1 and count_total=true and just using the total count from the response).

It may improve the speed of queries with count_total=true though, and I am pretty sure this would be useful, as it can be a use case to query for delegators amount and not the delegators themselves (for example I am using such queries for my validator's website).

and also can we close this issue and discuss on the older one, since there is already an issue present for this #10516.

Fine by me, but is it possible to put this older issue somewhere higher in the team backlog or resume discussion? The older issue seems quite stale, and the issue is quite present, moreover, with the current indexing it may be used as an attack vector on public nodes, as if you do a lot of such requests (asking for all delegations of a validator with a default limit, while this validator has 1 delegation, which will cause the fullnode to iterate over all delegations) to a public node, it'll apparently run out of resources and will stop being in touch with the rest of the network, keeping staying on one block (I had this behaviour on multiple nodes when doing a lot of such requests constantly).

atheeshp · 2023-02-27T13:22:38Z

You may be right, but if we rely on the FilteredPagination the query won't be faster, since it is unmarshalling the delegations and checking delegation info. Assume the delegation entry is at the end of the state.

freak12techno · 2023-02-27T13:24:38Z

@atheeshp

Assume the delegation entry is at the end of the state.

agree for the case where it's at the end, but still it'll be faster if it's in the beginning or in the middle, as then we won't need to iterate over all entries for getting total count

atheeshp · 2023-02-27T13:32:45Z

Hmm, but I feel that's may not be best way to do it. Let's wait to grab more eyes on this. Thanks for the thoughts @freak12techno

cc: @tac0turtle @AmauryM @julienrbrt

alexanderbez · 2023-02-27T14:33:46Z

Storing the count doesn't make sense IMO. However, I would agree that having a reverse index of validator address to delegator could make sense. The general idea that always comes up in these types of convos is that the gRPC query layer isn't meant to service all needs. It meant to serve basic representations of data from the state machine. It's the role of indexers and other 3rd party tools to do the heavy lifting. If your tool or service relies on this particular bit of info, you should ideally be using an service like Numia or something.

We're trying to remove complexity from the gRPC layer not add more to it 👍

freak12techno · 2023-02-27T21:12:50Z

I'd love to work on that if possible (as it's something that I would love to be fixed asap, as it breaks my monitoring tools), but I'd need some guidance on that, as I never contributed to cosmos-sdk before so I may have some knowledge gaps. Can someone describe the steps to be done to get this working, if it's not difficult?

tac0turtle · 2023-02-28T09:53:15Z

agree with bez, i think we should strive to provide ways to do this off chain instead of on chain. Something like versionDB from crypto.com which will be adopted in the sdk would help with your issue and wouldnt need to use the underlying node.

This is a good query to test with storage v2 once we have a working poc. I would guess that if youre not asking for proofs, the performance increase would magnitudes.

alexanderbez · 2023-02-28T17:54:49Z

I'd love to work on that if possible (as it's something that I would love to be fixed asap, as it breaks my monitoring tools

What's being suggested is that you rely on an indexing service for your monitoring tool :)

freak12techno · 2023-03-01T08:02:35Z

But having the index on validators for delegations isn't only affecting the api endpoints performance, but also the chain itself sometimes, doesn't it? I mean the fullnode needs to ask for validators delegations at least when slashing a validator. So this should be a good thing to add not only because the api is slow here.
Additionally, this can be used to make public fullnodes consume a lot of resources and to get them out of touch with the rest of the chain.

(If the delegators count part is better to be done with external services that's fine by me, but at least indexing delegations by validator would benefit a lot.)

alexanderbez · 2023-03-01T16:24:15Z

But having the index on validators for delegations isn't only affecting the api endpoints performance, but also the chain itself sometimes, doesn't it?

Nope! It's solely used for this one query. The key is used in the state machine yes, but no where in the state machine do we use this key.

freak12techno · 2023-03-05T11:02:00Z

Okay it's still not clear for me if this would be fixed and if yes, when can it be fixed. My concern if that even if I'll switch my monitoring tools to use something like indexers, there are a lot of tools (namely, restake is the biggest one I guess) that do such queries and it can cause the node such request is done against to get out of sync with the rest of the chain (I had it on my own node until I switched domains on my public node so there's no requests from cosmos.directory: once in a while my node would consider itself to be caught up, while staying on a single block for more than an hour, then it somehow fixes itself, making a node kinda unusable during such periods, as you can't broadcast anything through that node and some different queries to that node fail. I highly doubt it's the hardware, as it doesn't happen with my validator node, which has lower specs, and the issue disappeared once there were no requests from cosmos.directory to my node).
Additionally it's inconvenient imo that there's an endpoint that allows you to query data from it that can make a fullnode basically not working for some period while it seems to be something that is not difficult to fix.

@alexanderbez can you please clarify my concerns?

alexanderbez · 2023-03-05T15:19:44Z

Okay it's still not clear for me if this would be fixed and if yes, when can it be fixed.

Since the state machine doesn't need it, it's solely a client UX concern. That being the case, in order to support it would require the state machine to track it and in addition, would require a migration. To me, it's simply not worth it IMO.

there are a lot of tools (namely, restake is the biggest one I guess) that do such queries and it can cause the node such request is done against to get out of sync with the rest of the chain ...

Those tools too should be using indexers.

freak12techno · 2023-04-24T15:02:53Z

@tac0turtle I see this issue got closed due to merging PR, nice to hear that!
Can you clarify which release it'll be included to and when can we expect it on Cosmos Hub?

tac0turtle · 2023-04-24T16:02:03Z

yes!! @atheeshp killed it on finding a solution!! He is owed all the thanks!!

It will be in the eden release, its unclear when this release will land on the hub. We are aiming for a end of june release in the sdk

github-actions bot added the needs-triage Issue that needs to be triaged label Feb 25, 2023

freak12techno mentioned this issue Feb 26, 2023

/cosmos/staking/validators/{validator}/delegators unstable when querying #10516

Closed

4 tasks

tac0turtle added C:x/staking T: Performance Performance improvements and removed needs-triage Issue that needs to be triaged labels Mar 6, 2023

atheeshp mentioned this issue Apr 7, 2023

refactor: add delegations by validator index #15731

Merged

19 tasks

freak12techno mentioned this issue Apr 8, 2023

Cosmos Hub Restake always fails eco-stake/restake#726

Closed

tac0turtle closed this as completed in #15731 Apr 24, 2023

freak12techno mentioned this issue Sep 14, 2023

Backporting creating an index on validators' delegations to Cosmos Hub cosmos/gaia#2726

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimising querying for validator delegations #15162

Optimising querying for validator delegations #15162

freak12techno commented Feb 25, 2023

freak12techno commented Feb 25, 2023

freak12techno commented Feb 26, 2023

SmartStake commented Feb 27, 2023

atheeshp commented Feb 27, 2023 •

edited

Loading

freak12techno commented Feb 27, 2023 •

edited

Loading

atheeshp commented Feb 27, 2023

freak12techno commented Feb 27, 2023 •

edited

Loading

atheeshp commented Feb 27, 2023

alexanderbez commented Feb 27, 2023 •

edited

Loading

freak12techno commented Feb 27, 2023

tac0turtle commented Feb 28, 2023

alexanderbez commented Feb 28, 2023

freak12techno commented Mar 1, 2023

alexanderbez commented Mar 1, 2023 •

edited

Loading

freak12techno commented Mar 5, 2023

alexanderbez commented Mar 5, 2023

freak12techno commented Apr 24, 2023

tac0turtle commented Apr 24, 2023

Optimising querying for validator delegations #15162

Optimising querying for validator delegations #15162

Comments

freak12techno commented Feb 25, 2023

freak12techno commented Feb 25, 2023

freak12techno commented Feb 26, 2023

SmartStake commented Feb 27, 2023

atheeshp commented Feb 27, 2023 • edited Loading

freak12techno commented Feb 27, 2023 • edited Loading

atheeshp commented Feb 27, 2023

freak12techno commented Feb 27, 2023 • edited Loading

atheeshp commented Feb 27, 2023

alexanderbez commented Feb 27, 2023 • edited Loading

freak12techno commented Feb 27, 2023

tac0turtle commented Feb 28, 2023

alexanderbez commented Feb 28, 2023

freak12techno commented Mar 1, 2023

alexanderbez commented Mar 1, 2023 • edited Loading

freak12techno commented Mar 5, 2023

alexanderbez commented Mar 5, 2023

freak12techno commented Apr 24, 2023

tac0turtle commented Apr 24, 2023

atheeshp commented Feb 27, 2023 •

edited

Loading

freak12techno commented Feb 27, 2023 •

edited

Loading

freak12techno commented Feb 27, 2023 •

edited

Loading

alexanderbez commented Feb 27, 2023 •

edited

Loading

alexanderbez commented Mar 1, 2023 •

edited

Loading