-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: ADR-038 plugin proposal #10482
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this ADR is well written, although it would still be nice to add more example use cases for plugins (indexing state vs events, upload to Postgres, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
In the future it would be nice to have a plugin-based modular server along the lines of caddy. In that scenario basically everything in app.toml would be a plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left few comments to consider.
Start(wg *sync.WaitGroup) | ||
|
||
// Plugin is the base Plugin interface | ||
Plugin | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the StateStreaming
have a writer methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StateStreaming
plugin uses the Register
method to register StreamingService
(s) with the BaseApp
, the StreamingService
interface exposes WriteListener
(s) that have the OnWrite
method. So in other words, the writing interface is buried two levels down. Let me see if I can restructure this so that it is more clear where writing is occurring.
Instead of prescribing a final data format in this ADR, it is left to a specific plugin implementation to define and document this format. | ||
We take this approach because flexibility in the final format is necessary to support a wide range of streaming service plugins. For example, | ||
the data format for a streaming service that writes the data out to a set of files will differ from the data format that is written to a Kafka topic. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should unify the data format for all external services, especially message queues. Essentially client app should easily connect to RabbitMQ or Redis and expect the same data format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API already provides unified data by passing along the marshaled []byte
data. Adding a constraint like this might present some difficulties. The API needs be flexible to allow authors to choose the format that they want to store data on external systems. I don't see how it can be enforced given that plugin authors have control on how they process
, transform
, and store
the data externally. They may only be interested in specific data stored in a KVStore
, which can be JSON
, Protbuf
, or String
values.
believe this addresses #10337 |
Hi all. I have an example in place for Kafka (README.md) following the approach taken in i-norden#1. However, the current ADR-038 design does not guarantee consistent state between application internal state and external listeners (i.e: Kafka, PostgreSQL, etc). Instead, it uses a |
This already occurs, restarting the node will restart at the last committed state, it will re-process the block it failed to But I'm not convinced the onus of guaranteeing an arbitrary external system is properly synced should fall on the state machine, I'm curious what other people with more SDK experience and knowledge think. My thoughts: I could be thinking about this all wrong or over thinking it, and some of these concerns may not be particularly relevant to Kafka but the design is intended to support arbitrary external data destinations. Apologies for the scatteredness of these thoughts.
We need to decide how the app behaves when the channel does not signal successful delivery of messages. That behavior could be dependent on the nature of the external failure (e.g. did a pipe break and we need to re-establish our connection to the external service or is the network connection fine and we don't require reestablishing a connection?). We need to decide if it waits for a success signal after delivery of each individual ABCI message+state change set (e.g. a different Working through an example, let's say:
|
For this we can provide external systems with a choice. Currently ADR-038 supports multiple listeners per KVStore. Therefore, my thoughts are that there should be a configuration policy in place such that:
1 - commit right away 2 and 3 implicitly say that you care about consistency.
External system downtime is unknown and it should not be up to the SDK to support something like this. This would put a burden to on the node to keep track of messages that have failed and then replay them when the external system is started.
In business cases where consistency is more important, back-pressure is unavoidable. This comes down to choosing the right external system to save state and performance tuning. In the fire-and-forget approach the
Sending a success before processing would satisfy the
We should avoid any caching of undelivered messages. It complicates the design and puts unnecessary burden on the SDK (node).
My thoughts are this:
It's up to external system implementations to make sure that they are set up to be |
Thanks @egaxhaj-figure , got it! If we just have the node shutdown when it doesn't receive the expected success signal(s) within some threshold amount of time, that is simple enough.
By burden do you mean code complexity or performance impact? I agree the former would be a significant burden, but the later would be negligible compared to potential back-pressure from the state streaming and the regular processes of the node. And on the other-hand, causing the node to fall over and requiring external intervention every time a single message fails to be sent or acknowledged is a burden on the node operator. All these features will be optional, we could support all three options ("do nothing", "shut everything down", "try to recover programmatically, and then shut everything down if we can't within some amount of time or cache capacity"). The cache replay approach would not be ideal for instances of extended, indefinite external service downtime (although the cache size would be configurable, and as long as you have the hardware to support it it could support indefinite downtime recovery). But, for example, if we lose our network connection for a short amount of time due to intermittent network failure and not due to any issue with the system we are networking to, the internal caching approach could perform a complete recovery without any external intervention or node downtime. Having an intermediate option between "do nothing" and "shut everything down" doesn't seem unreasonable to me (in an ideal world where code complexity and dev hours aren't a concern). In any case, for the sake of progressing things let's move forward with your proposal and we can debate the merits of this alternative and add it as a feature at a later time if there is interest. |
Sorry, I wasn't clear. Yes, I was speaking to code complexity. I agree, it would be a great feature to have. However, given the code complexity, should this feature be included in the first release or a subsequent release? |
Ah sorry I need to stop editing my comments. Yeah you're spot on, for the sake of progressing things let's move forward with your proposal and we can debate the merits of this alternative and add it as a feature at a later time if there is actual interest and not just my hypotheticals 😅 |
Your point about code complexity is especially pertinent given my difficulty in keeping up with to-do's as is 😅 Going to make the updates to the docs to reflect Robert's comments and your proposal, and then I will rework the plugin system implementation branch to match the new state of things and open a PR for that. I'll have that wrapped up before taking off for holiday this Thursday. Thanks again @egaxhaj-figure ! |
😄 Happy Thanksgiving! |
For cosmos#10096 This PR introduces the updates to the ADR-038 spec for the transition to plugin-based streaming services. These updates reflect the implementation approach taken in i-norden#1 (will be rebased, retargeted, and reopened). ### Author Checklist *All items are required. Please add a note to the item if the item is not applicable and please add links to any relevant follow up issues.* I have... - [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [x] added `!` to the type prefix if API or client breaking change - [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [x] provided a link to the relevant issue or specification - [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules) - [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [x] added a changelog entry to `CHANGELOG.md` - [x] included comments for [documenting Go code](https://blog.golang.org/godoc) - [x] updated the relevant documentation or specification - [ ] reviewed "Files changed" and left comments if necessary - [ ] confirmed all CI checks have passed ### Reviewers Checklist *All items are required. Please add a note if the item is not applicable and please add your handle next to the items reviewed if you only reviewed selected items.* I have... - [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] confirmed `!` in the type prefix if API or client breaking change - [ ] confirmed all author checklist items have been addressed - [ ] reviewed state machine logic - [ ] reviewed API design and naming - [ ] reviewed documentation is accurate - [ ] reviewed tests and test coverage - [ ] manually tested (if applicable)
For #10096
This PR introduces the updates to the ADR-038 spec for the transition to plugin-based streaming services. These updates reflect the implementation approach taken in i-norden#1 (will be rebased, retargeted, and reopened).
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change