Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose support for custom indexing #3469

Closed
4 tasks
aaronc opened this issue Feb 1, 2019 · 14 comments
Closed
4 tasks

Expose support for custom indexing #3469

aaronc opened this issue Feb 1, 2019 · 14 comments

Comments

@aaronc
Copy link
Member

aaronc commented Feb 1, 2019

Summary

Regen Network would like functionality exposed in the Cosmos SDK to do custom indexing of blockchain transactions and possibly state.

Problem Definition

Regen Ledger is being built as a global ledger for ecological applications. Our data set will form a global ledger of ecological claims. All of these claims will be geo-tagged. To begin with, we would like to have all blockchain data about ecological state indexed in a geospatial data store like Postgis so that we can show a global map of data on the blockchain. Our blockchain will also serve as the backbone for a new decentralized ecological state verification infrastructure for verifying claims that a land owner may make such as having sequestered carbon in soils or preserved forest land. Part of the infrastructure for doing this verification will involve "verification oracles" having access to an index of all claims made on the blockchain up to a certain point in time for a particular piece of land. For instance, field scientists may be taking measurements with a soil sensor that records the results on the blockchain. A verification algorithm may then request to search the blockchain data for all soil sensor reports for that piece of land recorded in the past year.

Proposal

Our need is primarily to expose functionality within Cosmos/Tendermint that lets us implement custom indexing.

Two possible approaches discussed with @jaekwon were using Tendermint's existing TxIndexer or possibly the web socket interface. Our assessment is that creating a custom implementation of TxIndexer would be preferable because of the possible instability of the web socket connection.

It seems that if we could get access to the underlying Tendermint Node (when running Tendermint in process), we could access the EventBus and from there create a new IndexerService with our custom TxIndexer. If that will work, I don't see any reason to make any other modifications to how indexing works at the Tendermint level. The main change would be creating a hook at the Cosmos level that exposes the underlying Node. It looks like the place to do this would be in StartCmd in server/start.go where startInProcess returns the Node but it gets discarded: https://github.com/cosmos/cosmos-sdk/blob/develop/server/start.go#L41. Let me know if this solution makes sense and how best to expose it and then I can put together a PR.

Beyond this, it occurs to me that being able to index directly off changes to the multi-store might be useful. My inclination is to start with just indexing transactions and then revisit indexing off of state later if needed. The way I can see this working is pretty similar to how TraceKVStore currently works, except that we'd only be observing writes and there would be no need to base64 encode keys and values. This approach also looks pretty straightforward, but we'll see if it's needed.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@rigelrozanski
Copy link
Contributor

very excited for geospatial x blockchain integration (side note, I'd like to integrate whitebox GAT with cosmos one day)

My inclination is to start with just indexing transactions and then revisit indexing off of state later if needed

I would do the reverse, given that you can already build custom indices and associated query functionality on the store... we do it in staking to query for validators by pubkey, by power, etc. This approach could easily be expanded to query based on geolocation, or any other arbitrary property :)

Glad to walk you through the code with you you for how to do the above. You'll want to program in a custom module

With regards to custom tx indexing, I'd imagine it's possible with hooks, however this would be new functionality (more work than just building out in the existing framework)

@aaronc
Copy link
Member Author

aaronc commented Feb 12, 2019

Thanks @rigelrozanski a walk through would be great if you have the time.

I'm also realizing that it might better to do transaction indexing at the BaseApp level instead of the Tendermint level. If there were an indexer hook inside of BaseApp.DeliverTx then it would be pretty easy to index the StdTx's.

@rigelrozanski
Copy link
Contributor

I'd like to get a better feel for your intentions.. what is the purpose (examples?) for utilizing custom tx indexing over custom index queries on the store (as I proposed). I'm having trouble thinking of a use case

@aaronc
Copy link
Member Author

aaronc commented Feb 12, 2019

I described a bit of detail up in the Problem Definition up above. Basically we would like to have all geospatial data indexed in a database with dedicated geospatial support such as Postges. One use case would be generating map layers on the fly or doing analytics.

By custom index queries, is what's happening in GetBondedValidatorsByPower with the prefix iterator an example of this?

For our use cases, I'm pretty sure we want to be able to store things in an external database with robust secondary index support. Emulating something like a spatial index I think would be pretty complex with just the built-in KV store.

@rigelrozanski
Copy link
Contributor

Basically we would like to have all geospatial data indexed in a database with dedicated geospatial support such as Postges. One use case would be generating map layers on the fly or doing analytics.

There is no reason indexing to query by coordinates or any arbitrary field cannot be done using store indexing, so my perspective still holds, store indexing should be sufficient for your needs. If you wanted to have an external database which is subscribed to new data entering the blockchain the approach to use would be to subscribe to the existing tagging system (already built ;) ) and further simply query for the record from the store if it met the required condition to include that new data in your external database.

I'm pretty sure we want to be able to store things in an external database with robust secondary index support.

Yeah I think that's the obvious shorterm solution for complex GIS analysis ... the blockchain itself serves as the consensus layer on the data, whereas any number of viewers can be spun up for further interpretation of the data using centralized databases which are just feeding information from the blockchain.
-> this would probably be a lot easier to work with given the current state of GIS technology integration.

However I don't see why with further development GIS tooling cannot be used directly on the blockchain state. - in time.

By custom index queries, is what's happening in GetBondedValidatorsByPower with the prefix iterator an example of this?

That is probably the most complicated example but yes hahaha - that is an index.... checkout some of the specs for the other simpler indexes to get a better example:

specs: https://github.com/cosmos/cosmos-sdk/blob/develop/docs/spec/staking/state.md#validator

here is a simpler example, validators are also indexed by consensus address (even though the core record is indexed by operator address). This index is set for the first time here:
https://github.com/cosmos/cosmos-sdk/blob/develop/x/staking/handler.go#L132

From there it can be updated as required if the consensus address changes (currently in the x/staking this capability of updating the consensus address is not developed however)

@jackzampolin
Copy link
Member

the blockchain itself serves as the consensus layer on the data, whereas any number of viewers can be spun up for further interpretation of the data using centralized databases which are just feeding information from the blockchain.

It would be really nice to have a driver model here where it is easy to specify which tags and implement outputs for common open source DBs (redis, postgres, mongo, etc...)

@aaronc
Copy link
Member Author

aaronc commented Apr 4, 2019

Thanks for all your comments @rigelrozanski. I agree that in the future it will definitely be possible to support a broader range of GIS tooling on-chain. For now, I think it makes sense to use the blockchain as the "consensus layer" of the data as you say and off-load the geospatial querying to more specialized data stores.

I've figured out a way that works for our purposes to have nodes optionally index data to PostGIS - which is more or less the industry standard for geospatial indexing - without any modifications needed to baseapp or ABCI. I'm doing this by intercepting the baseapp ABCI methods and optionally forwarding their data to the indexer. It may be a bit unconventional, but in Regen Ledger, keepers can also get a handle to the indexer and do some indexing while they're modifying the state store. I was initially going to use the tag approach as suggested, but in assessing our architecture I realize this will result in a lot of duplication of the logic that is already in the keepers and that some state changes won't easily be reflected by tags. So effectively in our model, the index becomes part of the app state lifecycle on nodes where it is enabled (except that it is never queried directly by nodes like the store which is the consensus view shared by all nodes). Anyway, thus far this approach seems to work.

The code for this lives here in case anybody's interested: https://github.com/regen-network/regen-ledger/tree/master/index. I've abstracted a generic Indexer type which basically just intercepts ABCI methods and the Postgresql indexer implementation is pretty generic and could theoretically be used by other chains.

Anyway, since this is currently working for us, I'm going to go ahead and close this issue. But please let me know if you think this approach is more generically useful. I know you mentioned a driver approach @jackzampolin where you can specify sets of tags - in this approach all tags are currently indexed, but otherwise I think it's not too far off and I think it would not be a bad idea to support "indexing interceptors" like this directly in BaseApp.

@aaronc aaronc closed this as completed Apr 4, 2019
@aaronc
Copy link
Member Author

aaronc commented Apr 4, 2019

I'm going to re-open this since after the dev call it sounds like there may be interest in this approach.

@aaronc aaronc reopened this Apr 4, 2019
@tac0turtle
Copy link
Member

tac0turtle commented May 3, 2020

tendermint/tendermint#4466 partially solves this? but this may need a issue in tendermint as well

@aaronc
Copy link
Member Author

aaronc commented May 4, 2020

@marbar3778 ideally I'd like to have support on the SDK side for listening to changes on the store which is sort of orthogonal to what Tendermint can provide. But upstream improvements to Tendermint are always useful!

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 17, 2020
@tac0turtle
Copy link
Member

In a recent call it was said the best solution for this is to not rely on tendermint indexing and provide a custom indexing solution directly from the sdk.

@tac0turtle tac0turtle removed the stale label Jun 1, 2021
@alexanderbez
Copy link
Contributor

In a recent call it was said the best solution for this is to not rely on tendermint indexing and provide a custom indexing solution directly from the sdk.

Yes, I believe we already have an ADR for this too -- https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-038-state-listening.md

@i-norden i-norden mentioned this issue Sep 7, 2021
11 tasks
@tac0turtle
Copy link
Member

closing this in favour of adr038 issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants