Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CosmosSDK State Watching Features - KVStore reading and listening capabilities - Subscription/stream service(s) #7889

Closed
4 tasks done
i-norden opened this issue Nov 11, 2020 · 10 comments
Labels

Comments

@i-norden
Copy link
Contributor

i-norden commented Nov 11, 2020

Summary

State listening external streaming/pub-sub service

Problem Definition

Currently, KVStore data can be remotely accessed through Queries which proceed through Tendermint and the ABCI. In addition to these request/response queries, it would be beneficial to have a means of listening to state changes as they occur by some pub-sub or streaming mechanism.

The changes proposed here are the 2nd step towards achieving that, by exposing the state listeners introduced in #7888 to external consumers.

What problems may be addressed by introducing this feature?

Realtime data availability

What benefits does the SDK stand to gain by including this feature?

Tools for state listening should be beneficial for many cosmos applications, as a fundamental means of improving data availability/accessibility

Are there any disadvantages of including this feature?

If an application developer opts to use these features to expose data, they need to be aware of the ramifications/risks of that data exposure as it pertains to the specifics of their application

Proposal

Question: How should we externally expose KVStore state changes?

In #7888 the idea is that the BaseApp can use io.Writers to listen to specific KVStores. The BaseApp can then pass the io handlers into a streaming/server interface that can route and read out the state changes.

It may be better to use a more concrete type than io.Writer in #7888 since the type will need to support streaming out the data e.g.

type ExampleHandler struct {
	Stream chan []byte // stream the writer output from here
}

func (eh *ExampleHandler) Write(p []byte) (n int, err error) {
	select {
	case eh.Stream <- p:
		return len(p), nil
	default:
		return 0, errors.New("unable to write to channel")
	}
}

I've started to outline some of the potential mechanisms for streaming data out below, hoping to get some feedback on what the first approach should be.

Write to file

Pros:

  • Simple
  • Persistent
  • Durable
  • Multiple consumers
  • Embeddable

Cons:

  • Requires pruning, or can grow to an indefinite size
  • More difficult to access remotely
  • More difficult routing

Simple Pub-Sub (gRPC Streaming, WebSockets)

Pros:

  • Simple
  • Easy routing
  • Multiple consumers
  • No pruning
  • Embeddable

Cons:

  • Not persistent
  • Not durable

Pub-Sub using embedded persistent queue

e.g. https://github.com/joncrlsn/dque

Pros:

  • Easy routing
  • Multiple consumers
  • Minimal bloat
  • Persistent
  • Embeddable

Cons:

  • More complicated
  • Not durable

Pub-Sub using persistent Redis queue

e.g. https://github.com/adjust/redismq
Pros:

  • Easy routing
  • Multiple consumers
  • Persistent

Cons:

  • Depends on Redis
  • More complicated
  • Not durable

GraphQL ontop of Postgres queue (postgraphile) using NOTIFY triggers

Pros:

  • Easy routing
  • Multiple consumers
  • Persistent
  • Custom data transformations/views
  • Custom filtering- GraphQL consumer can define filters to impose on the pushed data

Cons:

  • Depends on Postgres
  • More complicated
  • Without db pruning will grow to indefinite size
  • Not durable (if subscriber isn't there when NOTIFY occurs they won't receive the message; but assuming the data hasn't been removed they could determine where they left off, where they picked back up, and fetch the data in-between)

External message service

e.g. Apache Kafka, RabbitMQ, KubeMQ
Pros:

  • Easy routing
  • Multiple consumers
  • Persistent
  • Durable
  • Scalable

Cons:

  • Depends on external service
  • More complicated

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@alexanderbez
Copy link
Contributor

Thanks for writing this up @i-norden.

I'm a bit hesitant at the moment to introduce these mechanisms directly into the SDK. Rather, I would prefer to see an interface that potentially each streaming/writing mechanism can implement. The implementations of this interface (e.g. RabbitMQ, ZMQ, Redis, file, etc...) can exist outside of the SDK and we can provide out-of-the-box implementations for users. In addition, the SDK can include a common and simple "base" implementation (e.g. file).

@i-norden
Copy link
Contributor Author

i-norden commented Nov 13, 2020

That makes sense! I'm going to open a new issue with that interface, or should I include that as part of the ADR from #7888?

@alexanderbez
Copy link
Contributor

I would write this all up in a single ADR. I don't think any of these changes are contentious 👍

@amaury1093
Copy link
Contributor

amaury1093 commented Nov 20, 2020

Since the SDK chose to use gRPC as the de facto RPC layer, an alternative to what @alexanderbez proposes is to hardcode the io.Writer from #7888 as a gRPC server stream. If users want to use Redis/file/RabbitMQ, then they listen on the gRPC service that will be baked into the SDK.

Pros:

  • possibility to implement the Redis/file/RabbitMQ infrastructure "as a client", e.g. in other languages
  • all advantages that come with protobuf & gRPC, incl.:
    • versioning guarantees,
    • client side code generation,
    • community for gRPC<->Redis/file/RabbitMQ plugins...

Cons:

  • gRPC is an obligatory intermediate step. With @alexanderbez's solution, app developer's could directly write to Redis/file/RabbitMQ, whereas this solution hardcodes gRPC as a middleman.

@alexanderbez
Copy link
Contributor

I like the idea! We'd need to be careful and examine any and all performance implications of having gRPC sit as a proxy. @i-norden do you think the consumer (e.g. RabbitMQ, ZMQ, file, etc...) cares about throughput or performance in general?

@i-norden
Copy link
Contributor Author

Coming from my experience with eth where it is a struggle to keep pace with the head of the chain while listening to (and indexing) state changes there is some knee-jerk concern about performance but I don't think that is relevant here, in large part because of features such as these which will make listening to state changes a breeze in comparison :)

My gut reaction is that those pros outweigh the overhead of the grpc middleman, but I admittedly don't have a great feel for the performance demands and constraints for cosmos applications yet e.g. how many state updates tend to occur per-block.

If we remain with the io.Writer interface and provide a concrete gRPC server stream implementation as the standard for backing it, people could still implement a more performant/direct io.Writer if need be.

@aaronc
Copy link
Member

aaronc commented Nov 23, 2020

I think gRPC streaming is useful for general purpose streaming. I do not think it is suitable for caching to a database because it is not fault tolerant. I would prefer to support both paths.

@i-norden
Copy link
Contributor Author

@aaronc apologies, I keep overlooking the persistence needs! I will outline both (file and grpc) approaches in the proposal. We can also include a teeing io.Writer type to load any number of destination io.Writers into to allow us to simultaneously write out to both file and grpc stream.

@i-norden
Copy link
Contributor Author

i-norden commented Nov 23, 2020

Sorry don't need to implement anything for that, will just use io.MultiWriter()

@i-norden
Copy link
Contributor Author

i-norden commented Sep 7, 2021

I think this issue/proposal can be closed as it has been formalized as part of the ADR-038 specification: #8012

@i-norden i-norden closed this as completed Sep 7, 2021
@i-norden i-norden mentioned this issue Sep 7, 2021
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants