Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consume file streamer output as reliable message queue #13652

Closed
yihuang opened this issue Oct 26, 2022 · 13 comments
Closed

consume file streamer output as reliable message queue #13652

yihuang opened this issue Oct 26, 2022 · 13 comments

Comments

@yihuang
Copy link
Collaborator

yihuang commented Oct 26, 2022

Summary

In #13516, we fixed state listeners issues and refactored file streamer output.
And we realized that the file streamer output can be further consumed reliably like a message queue, either locally or remotely, use inotify and equivalents1 to provide real time events.

Problem Definition

Proposal

Provide some utilities for allow consuming file streamer output conveniently and reliably.

Local

For local clients, one just use fsnotify library1 to monitor file system events of file streamer output directory, and read the local files.
We should implement some utilities to make it convenient.

Remote

For remote clients, we should provide a http server that serve the static files plus a long-polling endpoint for the new block event:

  • /block-{N}-data, download the data file of block N
  • /block-{N}-meta, download the meta file of block N
  • /newblock, a long polling endpoint to provide new block event.
    When the event for block N is emitted, the endpoints /block-{N}-data and /block-{N}-meta are guaranteed to return complete data.

/block-{N}-data and /block-{N}-meta are allowed to return partial data before the new block event is emitted.
http service setup is flexible, for example one can setup a nginx to serve the static files and reverse proxy the /newblock endpoint.

Client

Client should record the last block number that it has successfully consumed, and process new blocks from there, and subscribe to /newblock event notifications after it has caught up.

File streamer settings

To treat the file streamer output as a reliable message queue, it should be configured as a reliable one:

  • stopNodeOnErr=true for eventual consistency.
  • fsync=true, so we don't lose data in face of system crash.

Footnotes

  1. https://github.com/fsnotify/fsnotify 2

@mmsqe
Copy link
Contributor

mmsqe commented Oct 26, 2022

For the "new block event", do you mean tendermint one?

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 26, 2022

For the "new block event", do you mean tendermint one?

When the event for block `N` is emitted, the endpoints `/block-{N}-data` and `/block-{N}-meta` are guaranteed to return complete data.

The key is to provide the above guarantee, so the natural implementation should be watching new files:

  • watch file write event with fsnotify
  • check if file is complete using the 8bytes size at the file begining.
  • if the file is complete, emit new block event (parsing the block number from filename).

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 26, 2022

For the "new block event", do you mean tendermint one?

But since the tendermint NewBlockHeader is fired after commit event, so it can also provide that guarantee.
Actually the file streamer writing is done inside the commit event, so it's earlier than the tendermint event, there are edge case that commit event failed after file streamer, where the tendermint event is not fired but the file system one is, but unless there's sth wrong with consensus state machine, the block will eventually be replayed and everything should be fine.

When there are something wrong with consensus state machine, for example app hash mismatch happens, we probably have to manually rollback the file streamer output and downstream consumers.

@mmsqe
Copy link
Contributor

mmsqe commented Oct 26, 2022

But since the tendermint NewBlockHeader is fired after commit event, so it can also provide that guarantee. Actually file streamer writing is done inside the commit event, it's earlier than the tendermint event, but there are edge case that commit event failed after file streamer, where the tendermint event is not fired but the file system one is. Unless there's sth wrong with consensus state machine, the block will eventually be replayed and everything should be fine.

Since check length when read, if incomplete just retry fetch.

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 26, 2022

Since check length when read, if incomplete just retry fetch.

I think it's still better if the event can provide that guarantee, so when things work properly, client don't have to retry at all.
But when disconnect, client still need to poll for next block file to see if it's caught up.

@yihuang yihuang changed the title consume file streamer output as message queue consume file streamer output as reliable message queue Oct 26, 2022
@tac0turtle
Copy link
Member

with the remote approach, are we making guarantees around delivery and how the response interacts with the node?

Im more in favour of offering only file system and then writing a small remote cmd line tool that is not part of the node.

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 28, 2022

with the remote approach, are we making guarantees around delivery and how the response interacts with the node?

It provide offset based subscription, client maintain the consumption offset(block height), provides at least once delivery guarantee.

Im more in favour of offering only file system and then writing a small remote cmd line tool that is not part of the node.

Yeah, I think the server feature can be perfectly run in a separate process, may or may not embed inside the node.

@aaronc
Copy link
Member

aaronc commented Oct 28, 2022

In favor of this approach. How would stopNodeOnErr work here?

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 28, 2022

In favor of this approach. How would stopNodeOnErr work here?

Do you mean stopNodeOnErr of file streamer itself, or downstream consumer want to report error to the node?
For the first one, if stopNodeOnErr is true, the consensus state machine will stop if any error returned from file streamer, so after issue resolved, the block will be replayed.
For the latter one, it can't and don't have to, because it can always reprocess the file streamer output, at least before it's pruned away.

@peterbourgon
Copy link

we realized that the file streamer output can be further consumed reliably like a message queue, either locally or remotely, use inotify and equivalents1 to provide real time events.

The filesystem is a really great buffer/caching layer for an event delivery system! But it decouples producer and consumer in a way that makes it really difficult to achieve reliable delivery.

First, inotify is a best-effort system which doesn't provide reliable delivery guarantees, even locally. Those best-effort guarantees are further weakened by alternative implementations like remote/network adapters (sshfs, NFS, etc.) and non-Linux filesystems (macOS, btrfs, etc.).

Even if you give up on inotify and poll, it's still super difficult to establish that a state on a local FS is equivalent to a state on a [remote] consumer. The FS is by its nature asynchronous — caching, buffering, all sorts of stuff even before you get to the OS interfaces, and that's even before you get to the language adapters! If you need reliable delivery, you more or less have to use direct, synchronous connections between producer and consumer. A filesystem makes that impossible.

Happy to say more on the topic if anyone is interested. But, assuming a standard definition of "reliable delivery", the FS is unfortunately not an option.

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 30, 2022

we realized that the file streamer output can be further consumed reliably like a message queue, either locally or remotely, use inotify and equivalents1 to provide real time events.

The filesystem is a really great buffer/caching layer for an event delivery system! But it decouples producer and consumer in a way that makes it really difficult to achieve reliable delivery.

First, inotify is a best-effort system which doesn't provide reliable delivery guarantees, even locally. Those best-effort guarantees are further weakened by alternative implementations like remote/network adapters (sshfs, NFS, etc.) and non-Linux filesystems (macOS, btrfs, etc.).

Even if you give up on inotify and poll, it's still super difficult to establish that a state on a local FS is equivalent to a state on a [remote] consumer. The FS is by its nature asynchronous — caching, buffering, all sorts of stuff even before you get to the OS interfaces, and that's even before you get to the language adapters! If you need reliable delivery, you more or less have to use direct, synchronous connections between producer and consumer. A filesystem makes that impossible.

Happy to say more on the topic if anyone is interested. But, assuming a standard definition of "reliable delivery", the FS is unfortunately not an option.

If by reliable delivery we mean at least once delivery semantic without strict latency guarantees, I don't see why it won't work, can you shows a specific case where it fails?
Yeah, client should treat inotify as a best effort mechanism to reduce latency, and fall back to poll when it don't work.

@alexanderbez
Copy link
Contributor

alexanderbez commented Nov 15, 2022

#13516 (comment)

If we're moving towards a plugin-based design, shouldn't all consumers be plugin-based? Including a file consumer?

@tac0turtle
Copy link
Member

that is the design that the pr implements. closing this for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants