Consume ByteBuffer payloads sent via WebSocket #7106

sbordet · 2021-11-12T10:42:43Z

Jetty version(s)
10

Description
Following #4341, payloads to WebSocket frames are not consumed.

This is surprising as:

HttpOutput.send(ByteBuffer) consumes the buffer
HttpOutput.write(small) aggregates into the aggregation buffer, which is then consumed when sent
sending an HTTP/2 DataFrame consumes the buffer
reading from an InputStream consumes it
etc.

See also discussion in #7088.

@gregw any reason for not consuming the buffer?

gregw · 2021-11-14T23:57:32Z

I believe a WebSocket Frame is somewhat a higher level object than a ByteBuffer. It does contain a ByteBuffer, but it there is not a is-a relationship to ByteBuffer. Thus I think it is reasonable for the following to be true:

assert frame.getPayload().hasRemaining();
send(frame);
assert frame.getPayload().hasRemaining();

It is somewhat analogous to the HttpContent class we use in the resource service which combines meta data and the ByteBuffer containing the content. That ByteBuffer is not consumed when the content is sent, as it is something at a lower level takes a slice of the contents buffer and consumes that as it sends it. This way, a HttpContent can be cached and sent multiple times.

I agree the use-case for caching a websocket frame and sending it multiple times is less compelling and is further complicated by masking..... but I still see value in a Frame instance still being meaningful after being sent - ie you can query it's length and content for example.

Eitherway, we should strive to be consistent and both HTTP/2 DataFrame and Websocket Frame should work the same way.

So I'll flip this back the other way, @sbordet is there any reason for consuming the buffer from a frame? Is changed position it used to signal anything to the sending application? If so then should it? Is that leaking implementation details? Can we avoid slicing doing it one way or the other? If we do consume buffers in frames, then do we need to store the sizes separately, so they can be accessed after the send?

gregw · 2021-11-14T23:57:44Z

@lachlan-roberts thoughts?

gregw · 2021-11-15T00:10:02Z

Note also, that the Frame API is used for both sending and receiving, plus it has text frames. So things like: frame.setPayload("HelloWorld") can result in a frame that consumes the string.
Should methods like frame.getPayloadAsUTF8() consume?

Also I think I'd find it confusing if a Frame's payload was ever partially consumed. Let's say we have a Frame with 32KB bytes of payload. It's very clear what the frame header should be - i.e a data frame header with a length of 32KB. But let's say during the send, only half the payload is consumed before the transport blocks. We now have a frame with 16KB of payload.... which could look like it should be sent again wit ha new data frame header of length 16KB. But, that's not the case, it is the unsent trailing part of the previous sent 32KB header, so a header does not need to be generated for the frame again. Such transient states for the frame class are nonsensical and could only be resolved by making the frame class stateful and aware it is being sent.

I think it is far simpler for the frame to be immutable and for the underlying transport to take a slice of the payload buffer that it can pass to a gather write, which can then consume the buffer at its leasure without making the frame instance inconsistent.

So I think I have a firm preference for immutable frames.... but that preference itself is not immutable and I could be convinced by a compelling efficiency argument.

lachlan-roberts · 2021-11-15T03:42:17Z

We should also consider this from the Jetty/Javax WebSocket API level which what was being discussed in #7088.

@OnMessage
public void onMessage(Session session, ByteBuffer message) throws Exception
{
    session.getBasicRemote().sendBinary(message);
    // Should the message buffer now be fully consumed?
}

I have written a test for this which runs over Jetty, Tomcat & Undertow.

Jetty does not modify the buffer.
Tomcat does not modify the buffer.
Undertow sets position to 0 and the limit to the capacity, which increased the remaining to max capacity of the buffer.

Not sure why this is being done by undertow, but I guess the key point is that these other JSR356 implementations are not consuming the ByteBuffers that are sent.

lachlan-roberts · 2021-11-15T03:50:21Z

Actually it seems both tomcat and undertow do the same thing, it just happens that tomcat had a buffer of exactly the same size as the message, and it looks like undertow was using some pooled buffer so had a capacity larger than the message.

gregw · 2021-11-15T05:26:21Z

@lachlan-roberts I think that indicates that both Tomcat & Undertow are doing a buffer.clear(), which puts the buffer back into fill mode. So that's something different yet again.

Eitherway, that is an API with takes a ByteBuffer directly, which is a little different to an API that takes an entity that has-a ByteBuffer

sbordet · 2021-11-15T08:58:53Z

@gregw I think the buffer should be consumed.

try (FileChannel channel = ...) {
  ByteBuffer buffer = ByteBuffer.allocate(1024);
  while (true) {
    buffer.clear();
    channel.read(buffer);
    buffer.flip();

    // Either WebSocket using the buffer directly.
    session.getRemote().sendBinary(buffer);
    // Expect buffer to be consumed here (or when the callback is complete).

    // Or HTTP/2. wrapping the buffer into another object.
    stream.sendData(new DataFrame(stream.getId(), buffer));
    // Expect buffer to be consumed here (or when the callback is complete).
  }
}

From the point of view of the while loop, it should not matter whether I'm using a "direct" API or wrap the ByteBuffer into another object: I am producing a buffer that should be consumed by a send operation, and it does not matter if the implementation aggregates it, splits it, slices it, or pass it immediately to a Channel.write(ByteBuffer) that would consume it.

The "don't consume it so that we can resend it" seems a weak argument to me, as we are creating frames on the fly for a one-shot operation, and we don't keep frames around in case we want to resend them, so I don't think we should keep ByteBuffers around either.

It just seems weird that we have to do some extra work to keep the ByteBuffer indexes untouched when it's natural that they get moved because the buffer is consumed.

We never use InputStream.mark() to rewind an InputStream in case we want to resend it.

Caching cannot be performed when ByteBuffers come from external sources. HttpContent being an implementation class is therefore not a good example of API behavior that you want to expose to users, like WebSocket's sendBinary(ByteBuffer), since in the HttpContent case the implementation controls the ByteBuffer, but not in the WebSocket case.

Perhaps moving a couple of indexes is not such a big work, but I don't see the reason why we need to do it at all?
Just take the ByteBuffer and consume it, whether by aggregating it, or compressing it, or writing it directly seems simpler and more natural to me, and never worry about restoring any index.

gregw · 2021-11-15T09:27:14Z

@sbordet
I don't understand the point of your example. Your loop clears the buffer at the start of every iteration, so it doesn't matter if it is consume or not. Even if the buffer is consumed, it needs to be cleared to be reused in that example, so there is zero difference in the code if the buffer is consumed or not. I think your example is showing that it doesn't really matter either way in most cases. I agree that there is the extra work of a slice on every send if we don't consume the buffer from a frame, but I think that is a fairly cheap operation.

For me, it's not so much the "don't consume it so that we can resend it" argument, but rather the "a partially or fully consumed frame makes no sense" that resonates with me... which might be resolved with better naming.
I think the naming in h2 is OK as DataFrame has:

    public int remaining()
    {
        return data.remaining();
    }

Which makes it clear that it is not the data frame size, but how many bytes not yet sent.

However for websocket it has:

    public int getPayloadLength()
    {
        return payload.remaining();
    }

which is saying what the payload length is, which does change just because the frame has been sent or is being sent. I'm not sure if we ever call getPayloadLength() after a frame has been sent?

I guess the whole pass a buffer by reference through multiple layers is pretty fragile no matter what.... even if lower layers slice, upper layers can still change the content. The upper layers just have to know that once they call send any buffers passed can't be messed with until the callback or unblocking.

So I can see it a bit both ways. The most important thing is that we clearly understand what we are doing and well document it. We don't want applications slicing when they don't have to, or not slicing when they should. The next most important thing is consistency.... WebSocket and H2 have been done differently, so we have to weigh up the issues of changing vs being consistent. It's probably not a big deal eitherway, as I don't think any app can mutate a buffer during a send. If the buffer is not consumed, then that allows constant message to be resent without a slice.... but if the buffer is consumed, then constant message must be sliced before making the buffer.

github-actions · 2022-11-16T00:00:21Z

This issue has been automatically marked as stale because it has been a
full year without activity. It will be closed if no further activity occurs.
Thank you for your contributions.

github-actions · 2022-12-16T00:00:28Z

This issue has been closed due to it having no activity.

sbordet added the Bug For general bugs on Jetty side label Nov 12, 2021

github-actions bot added the Stale For auto-closed stale issues and pull requests label Nov 16, 2022

github-actions bot closed this as completed Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consume ByteBuffer payloads sent via WebSocket #7106

Consume ByteBuffer payloads sent via WebSocket #7106

sbordet commented Nov 12, 2021

gregw commented Nov 14, 2021

gregw commented Nov 14, 2021

gregw commented Nov 15, 2021

lachlan-roberts commented Nov 15, 2021 •

edited

Loading

lachlan-roberts commented Nov 15, 2021

gregw commented Nov 15, 2021

sbordet commented Nov 15, 2021

gregw commented Nov 15, 2021

github-actions bot commented Nov 16, 2022

github-actions bot commented Dec 16, 2022

Consume ByteBuffer payloads sent via WebSocket #7106

Consume ByteBuffer payloads sent via WebSocket #7106

Comments

sbordet commented Nov 12, 2021

gregw commented Nov 14, 2021

gregw commented Nov 14, 2021

gregw commented Nov 15, 2021

lachlan-roberts commented Nov 15, 2021 • edited Loading

lachlan-roberts commented Nov 15, 2021

gregw commented Nov 15, 2021

sbordet commented Nov 15, 2021

gregw commented Nov 15, 2021

github-actions bot commented Nov 16, 2022

github-actions bot commented Dec 16, 2022

lachlan-roberts commented Nov 15, 2021 •

edited

Loading