-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796
Comments
I literally just wrote one of these yesterday, with the interface @attr.s(auto_attribs=True)
class BufferedReceiveStream(trio.abc.AsyncResource):
transport_stream: trio.abc.ReceiveStream
chunk_size: int = 4096
async def aclose(self) -> None: ...
async def receive(self, num_bytes: int) -> bytes: ...
async def receive_all_or_none(self, num_bytes: int) -> Optional[bytes]: ...
async def receive_exactly(self, num_bytes: int) -> bytes: ...
class TextReceiveStream(trio.abc.AsyncResource):
transport_stream: trio.abc.ReceiveStream
encoding: str
errors: Optional[str]
newlines: Union[str, Tuple[str, ...], None]
chunk_size: int
def __init__(
self,
transport_stream: trio.abc.ReceiveStream
encoding: Optional[str] = None,
*,
errors: Optional[str] = None,
newline: Optional[str] = "",
chunk_size: int = 8192
): ...
async def aclose(self) -> None: ...
async def __aiter__(self) -> AsyncIterator[str]: ...
async def receive_line(self, max_chars: int = -1) -> str: ... I haven't tested it yet and I'm not totally sure about the interface, but I'll post code once I'm more convinced it works if people think either of these interfaces would be useful. The three receive methods in BufferedReceiveStream only differ in how they handle EOF; I wrote it to help with "you have a length and then that many bytes of data" type binary protocols. |
This may go without saying, but just in case I'll say it anyway :-): I'm being a bit cautious in the issue because if we add something directly to trio itself, then we want to make sure it's the right thing we can support for a long time. But for helpers and utilities and stuff that's outside trio, that doesn't apply, so everyone should at the least feel free to implement what they find useful, share it, whatever. In fact it can only be useful to see examples of what people come up with... |
Speaking of which, @oremanj, what's your use case? (If you can share.) |
Any movement on this? I'm looking for an equivalent for |
@pawarren I think this is up to date at the moment. I'm a bit confused about how |
I'm currently using asyncio.open_connection() and am referring to https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamReader.readuntil I connect to 58 web sockets, keep the connections permanently open, and for each web socket continuously loop over the following steps asynchronously:
The responses themselves are typically small XML messages or images. It seems like there's a default EOF for most websockets, which I imagine is why libraries like websockets can do things like "for message in socket". But these message generally have a separate end-of-message separator, b'\r\n\r\n'., and normal practice for dealing with that seems to be maintaining a buffer while repeatedly reading chunks of size ~4096 bytes while looking for the separator. reader.readuntil(b'\r\n\r\n') was a nice way of avoiding that. I'm new to async + websockets, so I might be missing an obvious solution or best practice. I hadn't seen trio-websockets before; thank you for the reference. |
Ah, OK, I think you mean regular TCP sockets :-). "Web socket" is a specific somewhat complicated protocol that you can use on top of regular sockets. For some reason the people designing WebSocket decided to give it a really confusing name. It sounds like you have your own simple ad-hoc protocol on top of sockets. You might say you're transmitting a series of "frames", and each frame is terminated by While you're waiting for us to get our ducks in a row and provide a more comprehensive solution, here's some code you can use: import trio
_RECEIVE_SIZE = 4096 # pretty arbitrary
class TerminatedFrameReceiver:
"""Parse frames out of a Trio stream, where each frame is terminated by a
fixed byte sequence.
For example, you can parse newline-terminated lines by setting the
terminator to b"\n".
This uses some tricks to protect against denial of service attacks:
- It puts a limit on the maximum frame size, to avoid memory overflow; you
might want to adjust the limit for your situation.
- It uses some algorithmic trickiness to avoid "slow loris" attacks. All
algorithms are amortized O(n) in the length of the input.
"""
def __init__(self, stream, terminator, max_frame_length=16384):
self.stream = stream
self.terminator = terminator
self.max_frame_length = max_frame_length
self._buf = bytearray()
self._next_find_idx = 0
async def receive(self):
while True:
terminator_idx = self._buf.find(
self.terminator, self._next_find_idx
)
if terminator_idx < 0:
# no terminator found
if len(self._buf) > self.max_frame_length:
raise ValueError("frame too long")
# next time, start the search where this one left off
self._next_find_idx = max(0, len(self._buf) - len(self.terminator) + 1)
# add some more data, then loop around
more_data = await self.stream.receive_some(_RECEIVE_SIZE)
if more_data == b"":
if self._buf:
raise ValueError("incomplete frame")
raise trio.EndOfChannel
self._buf += more_data
else:
# terminator found in buf, so extract the frame
frame = self._buf[:terminator_idx]
# Update the buffer in place, to take advantage of bytearray's
# optimized delete-from-beginning feature.
del self._buf[:terminator_idx+len(self.terminator)]
# next time, start the search from the beginning
self._next_find_idx = 0
return frame
def __aiter__(self):
return self
async def __anext__(self):
try:
return await self.receive()
except trio.EndOfChannel:
raise StopAsyncIteration
################################################################
# Example
################################################################
from trio.testing import memory_stream_pair
async def main():
sender_stream, receiver_stream = memory_stream_pair()
async def sender():
await sender_stream.send_all(b"hello\r\n\r\n")
await trio.sleep(1)
await sender_stream.send_all(b"split-up ")
await trio.sleep(1)
await sender_stream.send_all(b"message\r\n\r")
await trio.sleep(1)
await sender_stream.send_all(b"\n")
await trio.sleep(1)
await sender_stream.send_all(b"goodbye\r\n\r\n")
await trio.sleep(1)
await sender_stream.aclose()
async def receiver():
chan = TerminatedFrameReceiver(receiver_stream, b"\r\n\r\n")
async for message in chan:
print(f"Got message: {message!r}")
async with trio.open_nursery() as nursery:
nursery.start_soon(sender)
nursery.start_soon(receiver)
trio.run(main) |
Wow, this is fantastic!
I particularly like the use of bytearray's optimized delete-from-beginning
feature.
What do you mean by "It uses some algorithmic trickiness to avoid "slow
loris" attacks."? What's the trickiness?
And thank you for pointing me towards the definition of websocket. That
makes a lot of the docs I've been reading over the past few days make more
sense...
…On Mon, Mar 11, 2019 at 12:12 AM Nathaniel J. Smith < ***@***.***> wrote:
Ah, OK, I think you mean regular TCP sockets :-). "Web socket" is a specific
somewhat complicated protocol <https://tools.ietf.org/html/rfc6455> that
you can use on top of regular sockets. For some reason the people designing
WebSocket decided to give it a really confusing name.
It sounds like you have your own simple ad-hoc protocol on top of sockets.
You might say you're transmitting a series of "frames", and each frame is
terminated by \r\n\r\n. Which is, indeed, exactly the sort of thing that
this issue is about :-).
While you're waiting for us to get our ducks in a row and provide a more
comprehensive solution, here's some code you can use:
import trio
_RECEIVE_SIZE = 4096 # pretty arbitrary
class TerminatedFrameReceiver:
"""Parse frames out of a Trio stream, where each frame is terminated by a fixed byte sequence. For example, you can parse newline-terminated lines by setting the terminator to b"\n". This uses some tricks to protect against denial of service attacks: - It puts a limit on the maximum frame size, to avoid memory overflow; you might want to adjust the limit for your situation. - It uses some algorithmic trickiness to avoid "slow loris" attacks. All algorithms are amortized O(n) in the length of the input. """
def __init__(self, stream, terminator, max_frame_length=16384):
self.stream = stream
self.terminator = terminator
self.max_frame_length = max_frame_length
self._buf = bytearray()
self._next_find_idx = 0
async def receive(self):
while True:
terminator_idx = self._buf.find(
self.terminator, self._next_find_idx
)
if terminator_idx < 0:
# no terminator found
if len(self._buf) > self.max_frame_length:
raise ValueError("frame too long")
# next time, start the search where this one left off
self._next_find_idx = max(0, len(self._buf) - len(self.terminator) + 1)
# add some more data, then loop around
more_data = await self.stream.receive_some(_RECEIVE_SIZE)
if more_data == b"":
if self._buf:
raise ValueError("incomplete frame")
raise trio.EndOfChannel
self._buf += more_data
else:
# terminator found in buf, so extract the frame
frame = self._buf[:terminator_idx]
# Update the buffer in place, to take advantage of bytearray's
# optimized delete-from-beginning feature.
del self._buf[:terminator_idx+len(self.terminator)]
# next time, start the search from the beginning
self._next_find_idx = 0
return frame
def __aiter__(self):
return self
async def __anext__(self):
try:
return await self.receive()
except trio.EndOfChannel:
raise StopAsyncIteration
################################################################# Example################################################################
from trio.testing import memory_stream_pairasync def main():
sender_stream, receiver_stream = memory_stream_pair()
async def sender():
await sender_stream.send_all(b"hello\r\n\r\n")
await trio.sleep(1)
await sender_stream.send_all(b"split-up ")
await trio.sleep(1)
await sender_stream.send_all(b"message\r\n\r")
await trio.sleep(1)
await sender_stream.send_all(b"\n")
await trio.sleep(1)
await sender_stream.send_all(b"goodbye\r\n\r\n")
await trio.sleep(1)
await sender_stream.aclose()
async def receiver():
chan = TerminatedFrameReceiver(receiver_stream, b"\r\n\r\n")
async for message in chan:
print(f"Got message: {message!r}")
async with trio.open_nursery() as nursery:
nursery.start_soon(sender)
nursery.start_soon(receiver)
trio.run(main)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#796 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD71iKCpoWeA2uynp9JT-KhBL2Pe4WO_ks5vVgH5gaJpZM4Y-HF3>
.
|
There are two mistakes people often make, that cause this kind of code to have O(n^2) complexity:
If you make either of these mistakes, then it's easy for a peer that's malicious, or just misbehaving, to burn up a ton of your CPU time doing repeated work. For example, say someone sends you a 2,000 byte frame, but they do it one byte at a time. Now a naive implementation ends up scanning through ~1,000,000 bytes looking for the It's surprisingly subtle! |
Here's my hack on TerminatedFrameReceiver that extends the sans I/O idea, possible too far. It's similar to the sort/cmp split. Just as single definition of cmp allows lots of sort implementations, a single read & write allows interchangeable frame methods. My trio implementation leaves MUCH to be desired. It's more a POC that shows how a definition like receive_some() is sufficient for framing variations without pestering the runner (nursery) too much. Comments welcomed.
|
The trio-ish way would not use a
thus no need to start |
I think I missed a crucial point about sans I/O; it's synchronous. It needs to have no dependence on trio. It seems the buffer mgmt in wsproto, hyper-h2, et. al. is just different enough to require specialized glue code for each to use in trio. A 'standard mechanism' of how to supply bytes and get out objects for simple protocols might also help uniform adoption of these heftier protocols.
|
The rust |
I'm having a crisis of faith about sans-io. Well, at least a partial crisis. Here's what happened: I was looking again at my nascent library for writing sans-io protocols. Let's call it The core technical foundation is a receive buffer with O(n) algorithms for basic operations like consuming data and searching for delimiters. But on top of that foundation, we want to be able to write clean, readable, composable protocol code. For example, if we're defining a sans-io implementation of sending/receiving single lines, then (a) we want our implementation to be easy to read and reason about, (b) we want to make it easy to use it as a foundation for building new sans-io libraries on top, for line-oriented protocols like SMTP or FTP. The basic pattern in defining a sans-io parser is that you have a buffer, and you have some kind of "read more from the buffer" operation that either returns an event description or says that it needs more data (e.g. by returning The best trick I've seen for making this kind of code readable is to write the parser code itself as a generator, with a little wrapper to convert between the sans-io conventions and the generator conventions: see for example this version of wsproto's parser, which uses the convention that when you run out of data you The main downside of this approach is that it's very convenient to have So my sketch at this point has a generic sans-io-ish interface for users: class ByoioParser(Protocol, Generic[EventType]):
@abstractmethod
def next_event(self) -> Union[NEED_DATA, EventType]:
...
@abstractmethod
def feed_bytes(self, new_data: BytesLike) -> None:
... But I'm imagining you wouldn't normally implement this interface by hand, just like you don't normally implement the context manager interface by hand... instead you use a decorator: @byoio.protocol
def receive_uint32_prefixed_message(receive_buffer):
prefix = yield from receive_exactly(receive_buffer, 4)
(length,) = struct.unpack("<I", prefix)
return yield from receive_exactly(receive_buffer, length)
proto = receive_uint32_prefixed()
assert isinstance(proto, ByoioParser) And now composing protocols together is easy: either we provide some way for higher-level protocols to use the generator directly, like @byoio.protocol
def higher_level_protocol(receive_buffer):
message = yield from receive_uint32_prefixed.gen(receive_buffer)
# ... do something with 'message' ... Or better yet, provide some way to convert message = yield from read_uint32_prefixed(receive_buffer).asgen() But there's some awkwardness here... If we want a generator that returns a stream of real results, that's not too hard – that's how my old wsproto parser works. We just declare that you can @byoio.protocol
def parse_smtp(buffer):
subprotocol = parse_lines(buffer)
while True:
next_line = yield from subprotocol.gen_for_next()
... Except we also need some convention to signal that there is no next item... maybe we should use an exception? Anyway point is the point where I realized that I'd just reinvented async iterators, except more awkward because generators can't use OK, so, new plan: instead of generators, we'll define # One-shot protocol is a regular async function
@byoio.protocol
async def receive_uint32_prefixed_message(receive_buffer: bytearray):
prefix = await receive_exactly(receive_buffer, 4)
(length,) = struct.unpack("<I", prefix)
return await receive_exactly(receive_buffer, length)
# Multi-shot protocol is an async generator
@byoio.protocol
async def receive_lines(receive_buffer):
while True:
yield await receive_until(receive_buffer, b"\r\n") Internally, Of course we also need some way to send data, so maybe we should have a way to do that too. For convenience maybe we should bundle these primitives together on a little interface: class ByoioStream:
async def receive_some(self): ...
def send_all(self, data): ... ...but OK, that's basically a subset of the trio So it turns out that my grand plan is:
But alternatively, we could take the exact same code we wrote in step 1, pass in any trio One of the major advantages of sans-io code is that because it has such a narrow and controllable interface with the outside world, it's easy to test. And of course the other advantage is that the public API is neutral and can be used anywhere, including in sync code. The But it does suggest that we can feel free to start writing all the obvious primitives – One nasty issue I don't have a good answer for: the send/receive/bidi split makes everything super annoying to type, in the mypy sense. It's super annoying to have On another note: as I write |
.You've described in a lot of words why I didn't bother with sans-io-ing any of my code yet. There's two missing building blocks in sans-io. One is timeout handling, including the requirement to send frequent keepalives. The other is request multiplexing. WRT put/get: Umm … thinking about it, put/get works for single messages, while send/receive makes more sense for byte/character streams. This is because I associate put/get with things like channels or queues that you can use to put some single thing in on one end and get it out the other end. Calling Yes, |
That's why I find them attractive :-). For single objects we currently have And |
Yeah, one big limitation of the whole There are a bunch of useful protocols that fit within this constraint – that list, plus basic protocols like line splitting, plus some others like SMTP. But there are also a lot of protocols that have some quirky extra requirement. h2 does request multiplexing... not sure what you're missing there. But yeah, timing is a common one – protocols can specify timeouts, sometimes in odd corners. The ssh protocol doesn't seem like it cares about time, except, ssh servers need to rate-limit password checks. Some protocols need to be able to make new connections. Think SOCKS, or anything with automatic reconnect on failure, or http redirects. Some protocols need UDP. There's an interesting discussion happening at aiortc/aioquic#4 about how to make a sans-io interface for QUIC. All these could in principle be fit into the sans-io paradigm: all your need to do is be extremely explicit about what your interactions with the environment are and expose them as public API. You can even imagine a library like There's not going to be a single universal answer here, but I think it's helpful to try to map out the space of options. |
I came to some of the same conclusions. recv_exact and recv_until work better as distinct calls. Nesting protocols is hard. With However, the last one doesn't seem so bad. If I want one Some of the headaches with nesting are a consequence of of the Stream/Channel distinction. Utils for byte->obj can't be easily reused for obj->bigger_obj. |
So my prototype so far has the signatures: async def receive_exactly(rstream: ReceiveStream, rbuf: bytearray, size: int): ...
async def receive_until(rstream: ReceiveStream, rbuf: bytearray, delim: BytesLike, *, max_size: int): ...
class LineChannel(Channel[bytes]):
def __init__(self, stream: Stream, *, max_size: int, eol=b"\r\n", initial_data=b""): ...
async def send(...): ...
async def receive(...): ...
def detach(self) -> Tuple[Stream, BytesLike]: ...
class LengthPrefixedChannel(Channel[bytes]):
def __init__(self, stream: Stream, prefix_format: str, *, max_size: int, initial_data=b""): ...
# ... rest of API the same as LineChannel ... I'm assuming #1123 or something like it will be merged, so we don't have to pass in Please notice the # Start out with an empty bytearray
rbuf = bytearray()
# Pass it into every function call
header = await receive_exactly(stream, rbuf, 8)
length, flags = struct.decode("<HH", header)
body = await receive_exactly(stream, rbuf, length) And the receive part of class LineChannel(Channel[bytes]):
def __init__(self, stream: Stream, *, max_size: int, eol=b"\r\n", initial_data=b""):
self._stream = stream
self._max_size = max_size
self._eol = eol
# If you already have a receive buffer, you can pass it in as initial_data
self._rbuf = bytearray(initial_data)
async def receive(self):
# TODO: figure out EOF handling (discussed below)
return await receive_until(self._stream, self._rbuf, self._eol, max_size=self._max_size)
def detach(self):
# Pass back out the buffer, so that others can use it
stream, rbuf = self._stream, self._rbuf
self._stream, self._rbuf = None, None
return stream, rbuf Another obvious option would be to encapsulate the stream+buffer together in an object, like a
So I'm thinking, if the abstraction is that leaky, then maybe it's better to just get rid of it. EOF handling is something of an open question, and related to the above. In my draft, if async def receive(self):
try:
return await receive_until(self._stream, self._rbuf, self._eol, max_size=self._max_size)
except EOFError as exc:
if self._rbuf:
raise trio.BrokenResourceError from exc
else:
raise trio.EndOfChannel If we had a There is at least one case where EOF is not inherently persistent: on a TTY, if someone hits control-D, then reading from the tty will report EOF, and also clear the EOF flag, so that future reads go back to reading new keyboard input. So, you generally don't want to rely on being able to call For h11's receive buffer, it does have to store that flag because someone else is feeding in data and could feed in an EOF at any time, so we have to remember it. Here the reason we get away with potentially not needing it is that we don't call I also thought about having a |
(I split off #1125 for the discussion about renaming
Another possibility: just have # But how do we handle different constructor args?
LineChannel = StapledChannel.new_type(SendLineChannel, ReceiveLineChannel) or channel = make_line_channel(stream, ...) Or maybe I should give up and add this to the very short list of cases where implementation inheritance is OK... |
On the receive side we do need a buffer, obviously; equally obviously that buffer needs to be part of the object's interface. In fact we need to be able to share the buffer between "line" objects: some protocols (IMAP?) send a CRLF-delimited line that states "the next bit consists of N opaque bytes". Thus yes it should be accessible, but still be part of the object so that the easy case Just Works. I'd simply use an optional argument to the constructor. This also applies to a (buffered) sender. (The unbuffered case is easy …) Yes, mypy and similar is a challenge, but frankly I'd rather add some restriction-propagation logic to mypy to recognize "when you call
(where "Foo" is the somewhat-generic type of thing you'd transmit/receive). All that "read+write => bidirectional" stapling stuff should be restricted to the few cases where you really have two disparate streams you need to link up. I can think of only one relevant case, namely setting up two pipes for bidirectional communication, and even that is somewhat-obsolete when you have |
Part of which object? Which constructor? We don't want to add buffer management to the This is for the lower-level tools like
Hmm. I suppose they do have @overload
def line_channel(s: Stream, ...) -> LineChannel: ...
@overload
def line_channel(s: ReceiveStream, ...) -> LineReceiveChannel: ...
@overload
def line_channel(s: SendStream, ...) -> LineSendChannel: ...
def line_channel(s, ...):
# ... actual implementation ... I think you do still need separate |
Right. On the other hand we have channels that definitely need a shareable buffer for the underlying stream ( So, well, interpose a I don't think we'd need any sort of interim detaching mechanism for that buffer. A "I can use/modify the buffer" lock, or even a ConflictManager-protected accessor, should be sufficient. |
Slightly tangential: What would be a good trionic pattern that switches between line reader and then N bytes reader. E.g., for HTTP/SIP you want to parse by lines until Content-Length then you want N bytes. So I want to mix
In the SO line reader example, the stream bytes are sent into the generator, so after reading a few lines with some data pending in the I'm thinking of sending a tuple like
(Actual use case: I'm looking to do a trio port (from gevent) of GreenSWITCH (https://github.com/EvoluxBR/greenswitch) which switches between line reader and N bytes reader. The application protocol is that of the PBX software FreeSWITCH which behaves like HTTP.) |
@space88man I have a half written blog post about adapting @njsmith's TerminatedFrameReceiver for that - in my case to parse RESP3. In short, I ended up with the following: class TerminatedFrameReceiver:
def __init__(
self,
buffer: bytes = b"",
stream: Optional[trio.abc.ReceiveStream] = None,
terminator: bytes = b"\r\n",
max_frame_length: int = 16384,
):
assert isinstance(buffer, bytes)
assert not stream or isinstance(stream, trio.abc.ReceiveStream)
self.stream = stream
self.terminator = terminator
self.max_frame_length = max_frame_length
self._buf = bytearray(buffer)
self._next_find_idx = 0
def __bool__(self):
return bool(self._buf)
async def receive(self):
while True:
terminator_idx = self._buf.find(self.terminator, self._next_find_idx)
if terminator_idx < 0:
self._next_find_idx = max(0, len(self._buf) - len(self.terminator) + 1)
await self._receive()
else:
return self._frame(terminator_idx + len(self.terminator))
async def receive_exactly(self, n: int) -> bytes:
while len(self._buf) < n:
await self._receive()
return self._frame(n)
async def _receive(self):
if len(self._buf) > self.max_frame_length:
raise ValueError("frame too long")
more_data = await self.stream.receive_some(_RECEIVE_SIZE) if self.stream is not None else b""
if more_data == b"":
if self._buf:
raise ValueError("incomplete frame")
raise trio.EndOfChannel
self._buf += more_data
def _frame(self, idx: int) -> bytes:
frame = self._buf[:idx]
del self._buf[:idx]
self._next_find_idx = 0
return frame Essentially, refactored |
Just to add a data point (or anecdata point): For me, the
Having said that, I don't see wrapping those functions up in a
Vs:
I also considered wrapping I can imagine you might want a wrapper |
Very interesting discussion about sans-io approach! I'm more of an applied engineer and much less of an architect, so maybe not understand the problem from all sides. However, do I understand it right, that those sans-io problems could be circumwented by simply making sans-io interface async by default? In that case, timeouts would not be an issue and no AsyncIterator should be reinvented. This is regarding comment #796 (comment) |
Most networking libraries provide some standard way to implement basic protocol building blocks like "split a stream into lines", "read exactly N bytes", or "split a stream into length-prefixed frames", e.g.:
asyncio
StreamReader.readline
,StreamReader.readexactly
,StreamReader.readuntil
The classes in
twisted.protocols.basic
The stdlib socket module's
makefile
method, that lets you get access to the full Python file API, includingreadline
and friendsTornado
IOStream
'sread_until
We don't have anything like this currently, as I was reminded by this StackOverflow question from @basak.
Note: if you're just looking for a quick way to read lines from a trio Stream, then click on that SO link, it has an example.
Use cases
LineReceiver
andLineOnlyReceiver
have subclasses implementing HTTP, IMAP, POP3, SMTP, Ident, Finger, FTP, Memcache, IRC, ... you get the idea.Int16Receiver
), though sometimes it involves lines, e.g. newline-terminated JSON, or the log parser in linehaul.readline
andread_until
are pretty useful. This particular case can also benefit from more sophisticated tools, like TTY emulation and pexpect-style pattern matching.Considerations
Our approach shouldn't involve adding new methods to
Stream
, because the point of theStream
interface is to allow for lots of different implementions, and we don't want to force everyone who implementsStream
to have to reimplement their own version of the standard frame-splitting algorithms. So this should be some helper function that acts on aStream
, or wrapper class that has-aStream
, something like that.For "real" protocols like HTTP, you definitely can implement them on top of explicit (async) blocking I/O operations like
readline
andread_exactly
, but these days I'm pretty convinced that you will be happier using Sans I/O. Some of the arguments for sans-io design are kind of pure and theoretical, like "better modularity" and "higher reusability", but having done this twice now (with h11 and wsproto), I really don't feel like it's an eat-your-vegetables thing – the benefits are super practical: like, you can actually understand your protocol code, and test it, and people with totally different use cases show up to fix bugs for you. It's just a more pleasant way to do things.OTOH, while trio is generally kind of opinionated and we should give confused users helpful nudges in the best direction we can, we don't want to be elitist. If someone's used to hacking together simple protocols using
readline
, and is comfortable doing that, we don't want to put up barriers to their using trio. And if the sans-i/O approach is harder to get started with, then for some people that will legitimately outweigh the long-term benefits.There might be one way to have our cake and eat it to: if we can make the sans-I/O version so simple and easy to get started with that even beginners and folks used to
readline
don't find it a barrier. If we can pull this off, it'd be pretty sweet, because then we can teach the better approach from the beginning, and when they move on to implementing more complex protocols, or integrated existing libraries like h11/h2/wsproto, they're already prepared to do it right.Alternatively, if we can't... there is really not a lot of harm in having a
lines_from_stream
generator, or whatever. But anything more than that is going to require exposing some kind of buffering to the user, which is the core of the sans-I/O pattern, so let's think about sans-I/O for a bit.Can we make sans-I/O accessible and easy?
The core parts of implementing a high-quality streaming line reader, a streaming length-prefixed string reader, or an HTTP parser, are actually all kind of the same:
h11 internally has a robust implementation of everything here except for specifying delimiters as a regex, and I need to add that anyway to fix python-hyper/h11#7. So I have a plan already to pull that out into a standalone library.
And the API to a sans-I/O line reader, length-prefixed string reader, HTTP parser, or websocket parser for that matter, are also all kind of the same: you wrap them around a
Stream
, and then call areceive
method which tries to pull some "event" out of the internal buffer, while refiling the buffer as necessary.In fact, if you had sans-I/O versions of any of these, that all followed the same interface conventions, you could even have a single generic wrapper that binds them to a Trio stream, and implements the
ReceiveChannel
interface! Where the objects being received are lines, orh11.Event
objects, or whatever.So if you really just wanted a way to receive and send lines on a
Stream
, that might be:That's maybe a little bit more complicated than I'd want to use in a tutorial, but it's pretty close? Maybe we can slim it down a little more?
This approach is also flexible enough to handle more complex cases, like protocols that switch between lines-oriented and bulk data (HTTP), or that enable TLS half-way through (SMTP's STARTTLS command), which in Twisted's
LineReceiver
requires some special hooks. You can detach the sans-I/O wrapper from the underlying stream and then wrap it again in a different protocol, so long as you have some way to hand-off the buffer between them.But while it is flexible enough for that, and that approach is very elegant for Serious Robust Protocol implementations, it might be a lot to ask when someone really just wants to call
readline
twice and then read N bytes, or something like that. So maybe we'd also want something that wraps aReceiveStream
and providesread_line
,read_exactly
,read_until
, based on the same buffering code described above but without the fancy sans-I/O event layer in between?The text was updated successfully, but these errors were encountered: