Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Queriers for staking to increase Gaia-lite performance #2139

Closed
wants to merge 37 commits into from

Conversation

fedekunze
Copy link
Collaborator

@fedekunze fedekunze commented Aug 24, 2018

Ref: #2078
Fixes #2009


  • Targeted PR against correct branch (see CONTRIBUTING.md)

  • Linked to github-issue with discussion and accepted design OR link to spec that describes this work.

  • Wrote tests

  • Updated relevant documentation (docs/)

  • Added entries in PENDING.md with issue #

  • reviewed Files changed in the github PR explorer


For Admin Use:

  • Added appropriate labels to PR (ex. wip, ready-for-review, docs)
  • Reviewers Assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

@codecov
Copy link

codecov bot commented Aug 24, 2018

Codecov Report

Merging #2139 into develop will decrease coverage by 0.8%.
The diff coverage is 17.84%.

@@            Coverage Diff             @@
##           develop   #2139      +/-   ##
==========================================
- Coverage    63.91%   63.1%   -0.81%     
==========================================
  Files          134     135       +1     
  Lines         8194    8348     +154     
==========================================
+ Hits          5237    5268      +31     
- Misses        2604    2727     +123     
  Partials       353     353

@fedekunze fedekunze self-assigned this Aug 24, 2018
x/stake/queryable.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 27, 2018

Codecov Report

❗ No coverage uploaded for pull request base (develop@d214952). Click here to learn what that means.
The diff coverage is 44.69%.

@@            Coverage Diff            @@
##             develop   #2139   +/-   ##
=========================================
  Coverage           ?   64.1%           
=========================================
  Files              ?     142           
  Lines              ?    8808           
  Branches           ?       0           
=========================================
  Hits               ?    5646           
  Misses             ?    2758           
  Partials           ?     404

@faboweb faboweb mentioned this pull request Aug 28, 2018
4 tasks
@fedekunze fedekunze changed the title WIP: Queriers for staking to increase Gaia-lite performance R4R: Queriers for staking to increase Gaia-lite performance Aug 29, 2018
@fedekunze fedekunze requested a review from faboweb August 29, 2018 14:23
Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments @fedekunze -- I think it'll be good after this.

x/stake/keeper/delegation.go Outdated Show resolved Hide resolved
x/stake/keeper/validator.go Outdated Show resolved Hide resolved
@ValarDragon
Copy link
Contributor

ValarDragon commented Sep 1, 2018

In the future, lets please not combine refactors with actual functionality changes. I'm having a very hard time distilling the actual functionality change. (Aside from the delegation stuff, It seems its some bech32 stuff, but there aren't comments anywhere motivating this, nor discussion in the issue / statement at the top of the PR? It probably makes sense, just not indicating it increases time to grok) Reviewing by diff breaks down as we grow PR size. To quote Anton, quality of PR review goes down as PR size grows.

I.E. The bech32validator stuff is in the same PR, as something thats change the capitalization as some log messages, which is in the same PR as stuff that changes the name of a ton function parameters names. These all sound reasonable, but they don't need to be evaluated together with the delegation speedups.

I saw I was requested to review this. Is it important that I do so? If so I think this should be split up into multiple PR's. Optimizing for quality of PR review is crucial, and something I think we need to start enforcing. I totally understand that slowness of merging things / annoyances of branching make including more refactors into the same PR something that naturally happens. I think as a team we just need to get better at pointing this out and avoiding it. (I'm guilty of it as well!)

I'm not sure this is something thats important for me to review though, as I don't do anything on this part of the codebase, and rige, sunny and bez are reviewing this.

@fedekunze
Copy link
Collaborator Author

@ValarDragon
First of all, this IS a refactor. It's about changing the actual implementation of the staking Lite endpoints by introducing Queriers.

Second, I'm fine with splitting PRs into several issues and in general I'm completely in favour of it, that's why I opened #2182 and #2202. Nevertheless, when the PR has been sitting for more than a week waiting to be merged and with more than 3 reviews already, it's definitely a waste of time for the author of the PR. If you want to propose splitting the PR into several ones please do so in the first days that the PR has been labeled as R4R.

I've already discussed with @faboweb that this slowness in merging PRs because of things like this is blocking our work in Voyager. This in particular fixes #2009 which is currently slowing GET requests on the validators page in Voyager. I'm literally working during weekends because we need to move forward with this PR. @alexanderbez and @cwgoes one of the only ones who're actually continuously reviewing PRs and speeding things up. Thank you so much @alexanderbez for reviewing even when sometimes you haven't been requested for review !

Also, you don't need to review the PR if you don't want. I requested a review from you because I think you are bright and smart and because your input can definitely be valuable here, specially in what's related to this comment #2139 (comment).

Cheers

@ValarDragon
Copy link
Contributor

ValarDragon commented Sep 2, 2018

Theres a lot of "clearly correct" parts of this PR, that if were in a separate PR would get merged extremely quickly. I'll link some examples.

https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-b67911656ef5d18c4ae36cb6741b7965R163
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-a76d5d06816361792481ba6a8fdfc428R835
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-a76d5d06816361792481ba6a8fdfc428R846
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-a76d5d06816361792481ba6a8fdfc428R899
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-b905817409b4c0b71e9adffb102682b4R530
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-c51941a9f6c50cb3936120ad31fa8b98R117
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-c51941a9f6c50cb3936120ad31fa8b98R146
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-f8ff7bcd7f2bad2557f739fcdd296803R78
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-f8ff7bcd7f2bad2557f739fcdd296803R88
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-f8ff7bcd7f2bad2557f739fcdd296803R124
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-f8ff7bcd7f2bad2557f739fcdd296803R157
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-f8ff7bcd7f2bad2557f739fcdd296803R185
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-f8ff7bcd7f2bad2557f739fcdd296803R201
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-4234dc9305cb932836bf66430fdced7bR25
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-70e9b3c8bdbdb2c8b4fab2d50c9a348aR49
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-efea1ed721aa6b4db9b367adb505202cR23
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-efea1ed721aa6b4db9b367adb505202cR31
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-b905817409b4c0b71e9adffb102682b4R32
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-b905817409b4c0b71e9adffb102682b4L119
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-dfc120ab4c4b55ca9500442fe7d7287fL126
https://github.com/cosmos/cosmos-sdk/pull/2139/files#diff-dfc120ab4c4b55ca9500442fe7d7287fL420

All of these are clearly correct, and are just refactoring / minor changes that are independent from the Querier feature. If a PR was made of just those, it would likely get merged within 48 hours. But more importantly, it would reduce the diff here by like 200 - 400 lines, which makes it substantially easier to review. (And therefore get this merged faster) Normally including of the minor things isn't a big deal. It only starts being a problem when the PR size grows to be very large. You can even base your second PR on the refactoring PR.

Something that would have made this immensely easier to review would have just been doing a find/replace for delAddr -> delegatorAddr and valAddr -> validatorAddr in a separate PR, as its that change in particular that changed the diff size so much.

I totally understand you've put tons of work went into this. I also see why requests to split it up are frustrating. There isn't any need to do it this time, as people have already reviewed it, but note that when that comment was made, the PR was only R4R for 3 days. However further splitting up PR's similar to this in the future would make them get reviewed / merged much more quickly. (hence alleviating I've already discussed with @faboweb that this slowness in merging PRs because of things like this is blocking our work in Voyager.) Thanks for already starting to scope the PR's by splitting into more issues.

As a separate point of discussion, could we unblock voyager by making a "voyager branch" of the SDK. Then have voyager work of the voyager branch, and you could merge any PR you needed into that branch immediately. (This can be done via command line, no github PR required) When relevant aspects get merged to develop, delete the voyager branch and remake it off of develop again?

(e.g. proposed flow is: Create voyager branch, create feature branch off develop. PR feature branch into develop, just merge it into voyager branch immediately. Once feature is in develop, reset voyager branch to develop.)

}
iterator.Close()
return validators
return validators[:i] // trim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only adds i items to the validators array, is there a reason to trim it?

x/stake/client/rest/query.go Outdated Show resolved Hide resolved
Copy link
Contributor

@rigelrozanski rigelrozanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alllllllright! - I went through this PR in reasonable detail for design (I didn't go through the new tests in detail however - but they look clean which is good)

There are a number of structural changes which I'd requested in my review. My biggest quarrel is that there is a lot of code bloat introduced into the keeper which I am windly against - a lot of bloat can be moved into functional attributes of the the objects (and kept in x/stake/types/ maybe we will want to keep some things in the stake keeper but we should add that to a query_utils.go.

Additionally I request that the all changes to x/gov/ be moved to a new PR just to keep this one clean.

Lastly, there is a bit of refactor stuff which I request is removed from this PR (I've made specific comments throughout)

if !iterator.Valid() {
break
}
for ; iterator.Valid() && (!retrieve || (retrieve && i < int(maxRetrieve[0]))); iterator.Next() {
Copy link
Contributor

@rigelrozanski rigelrozanski Sep 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite sure we don't want to combine GetValidators and GetAllValidators - in reality GetAllValidators should rarely be getting called and is kind of an edge case for exporting the state of the entire blockchain, everything else should retrieve predetermined maximum number of validators

// Get the set of all validators, retrieve a maxRetrieve number of records
func (k Keeper) GetValidators(ctx sdk.Context, maxRetrieve int16) (validators []types.Validator) {
// Get the set of all validators. If maxRetrieve is supplied, the respective amount will be returned.
func (k Keeper) GetBechValidators(ctx sdk.Context, maxRetrieve ...int16) (validators []types.BechValidator) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my previous comment we should rarely be using GetAllValidators so it should be its own edge case. But also, let's remove use of this function altogether, it bloats the keeper - what we should really be doing is just converting the validators to bech once we've retrieved them from GetValidators.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that implementation we'd have to iterate the validator array twice, one to retrieve the validators and another to convert each of them into BechValidator ...

x/stake/keeper/validator.go Outdated Show resolved Hide resolved
x/stake/keeper/validator.go Outdated Show resolved Hide resolved
x/stake/keeper/keeper.go Outdated Show resolved Hide resolved
* A new bech32 prefix has been introduced for Tendermint signing keys and
addresses, `cosmosconspub` and `cosmoscons` respectively.

* A new bech32 prefix has been introduced for Tendermint signing keys and addresses, `cosmosconspub` and `cosmoscons` respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary/ undesirable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/cosmos/cosmos-sdk/pull/2103/files if we do introduce new bech32's, it would be cosmosval and cosmosvalpub, not cosmoscons* just fyi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe there's something wrong with the sdk impl right now... i don't think the validator operator address should be anything different than a normal address.

Copy link
Collaborator Author

@fedekunze fedekunze Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaekwon @rigelrozanski @ValarDragon @alexanderbez Currently BechValidator is not using bech32 prefix for sdk.ValAddress, just hex. Also its PubKey value uses cosmosconspub prefix. Is that ok ? Because I thought it was supposed to be using cosmosvalpub instead

PENDING.md Show resolved Hide resolved
err2 := keeper.cdc.UnmarshalJSON(req.Data, proposalID)
if err2 != nil {
return []byte{}, sdk.ErrUnknownRequest(fmt.Sprintf("incorrectly formatted request data - %s", err2.Error()))
errRes := keeper.cdc.UnmarshalJSON(req.Data, proposalID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo all these refactor changes (for now) to use errRes and res it's bloating this PR - maybe make a refactor PR after this? - Although, I also don't really agree with this modification either - but we can discuss in the next PR

x/stake/client/rest/query.go Outdated Show resolved Hide resolved
x/stake/client/rest/utils.go Show resolved Hide resolved
@rigelrozanski rigelrozanski self-assigned this Sep 2, 2018
@rigelrozanski
Copy link
Contributor

p.s. I just assigned myself to this issue, I'll gladly merge once comments addressed

@rigelrozanski
Copy link
Contributor

Also note @fedekunze @faboweb - Voyager should not be blocked on things like merging a PR - The voyager team should work off this branch / their own updated branch which contains any new features they want to test out. Rushing merging PR's like this is a big nono - there are it a LOT going on here and requires thorough review. Previously there were staking rest bugs which were introduced with an LCD refactor PR (#1880) which I'm pretty sure was "rushed" to merge - in this previous PR I had not actually finished reviewing it yet but it got merged. Subsequently I had to fix dem bugs, which will happen from time to time and IS OKAY and expected, but you know, I would have rather caught them in review.

@fedekunze
Copy link
Collaborator Author

@rigelrozanski what do you think of my comment ?

validator.BechValidator() is returning the ValAddress in hex for the operator and I'm pretty sure the prefix for the pubkey is also wrong. It's using cosmosconspub were it should be cosmosvalpub.

@fedekunze
Copy link
Collaborator Author

@rigelrozanski @ValarDragon

The voyager team should work off this branch / their own updated branch which contains any new features they want to test out

I don't think that's currently an option. We are behind schedule with staking in Voyager and we don't have capacity to maintain another branch. Besides, if we want those changes to be on develop we'd still have to edit the code to address the comments from PR review, making the branch redundant.

@fedekunze
Copy link
Collaborator Author

@rigelrozanski @ValarDragon @alexanderbez FYI I'll update the PR here and split it once I've addressed Rigel's comments. Thanks for your reviews

@jaekwon
Copy link
Contributor

jaekwon commented Sep 3, 2018

This PR I think is poking at some structural issues we have with the SDK/LCD system.

Before, the LCD handlers were querying the store (a cryptographically verifiable operation using merkle proofs...) so it would have been possible for the LCD to verify everything along the way... but this refactor is trying to push all the logic into keepers.

I'm not sure why this PR makes things faster -- as the comments say, there are a lot of concerns that are mixed in here and it makes for a big PR that is hard to reason about. Why does this PR increase performance?

Anyways, I'm starting to think that the LCD should maybe be replaced with a full node w/ queriers, so that Voyager etc can query any full node directly. There's no point to the LCD if it's just going to proxy the result of a full node's Querier. If we need this for performance reasons, then we might as well bypass the LCD for now until we figure out how to efficiently call the Querier from an LCD (which I think we can do by moving logic into the Keeper as this PR does, and then calling the keeper with a cliCtx which has Stores that are RemoteStores which handles Merkle proofs).

So I'm in favor of moving logic into keepers, and I think we want to have Voyager just query the full node to avoid unnecessary proxying logic that defeats the point of an LCD in the first place.

Also, @fedekunze why is this PR an improvement in speed? Is it because we're avoiding multiple REST query calls inside of a loop or something? Could Voyager just query the full node instead of proxying through LCD?

We have an SDK standup at 10am PDT M/W/F. Can you make that, or someone familiar with Voyager's requirements of the LCD?

@cwgoes cwgoes mentioned this pull request Sep 3, 2018
5 tasks
@fedekunze fedekunze changed the title R4R: Queriers for staking to increase Gaia-lite performance WIP: Queriers for staking to increase Gaia-lite performance Sep 4, 2018
@fedekunze
Copy link
Collaborator Author

Splitting and moving to #2249 and #2259

@fedekunze fedekunze closed this Sep 7, 2018
@fedekunze fedekunze deleted the fedekunze/2009-queriers-staking branch September 7, 2018 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants