Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume ByteBuffer payloads sent via WebSocket #7106

Closed
sbordet opened this issue Nov 12, 2021 · 10 comments
Closed

Consume ByteBuffer payloads sent via WebSocket #7106

sbordet opened this issue Nov 12, 2021 · 10 comments
Labels
Bug For general bugs on Jetty side Stale For auto-closed stale issues and pull requests

Comments

@sbordet
Copy link
Contributor

sbordet commented Nov 12, 2021

Jetty version(s)
10

Description
Following #4341, payloads to WebSocket frames are not consumed.

This is surprising as:

  • HttpOutput.send(ByteBuffer) consumes the buffer
  • HttpOutput.write(small) aggregates into the aggregation buffer, which is then consumed when sent
  • sending an HTTP/2 DataFrame consumes the buffer
  • reading from an InputStream consumes it
  • etc.

See also discussion in #7088.

@gregw any reason for not consuming the buffer?

@sbordet sbordet added the Bug For general bugs on Jetty side label Nov 12, 2021
@gregw
Copy link
Contributor

gregw commented Nov 14, 2021

I believe a WebSocket Frame is somewhat a higher level object than a ByteBuffer. It does contain a ByteBuffer, but it there is not a is-a relationship to ByteBuffer. Thus I think it is reasonable for the following to be true:

assert frame.getPayload().hasRemaining();
send(frame);
assert frame.getPayload().hasRemaining();

It is somewhat analogous to the HttpContent class we use in the resource service which combines meta data and the ByteBuffer containing the content. That ByteBuffer is not consumed when the content is sent, as it is something at a lower level takes a slice of the contents buffer and consumes that as it sends it. This way, a HttpContent can be cached and sent multiple times.

I agree the use-case for caching a websocket frame and sending it multiple times is less compelling and is further complicated by masking..... but I still see value in a Frame instance still being meaningful after being sent - ie you can query it's length and content for example.

Eitherway, we should strive to be consistent and both HTTP/2 DataFrame and Websocket Frame should work the same way.

So I'll flip this back the other way, @sbordet is there any reason for consuming the buffer from a frame? Is changed position it used to signal anything to the sending application? If so then should it? Is that leaking implementation details? Can we avoid slicing doing it one way or the other? If we do consume buffers in frames, then do we need to store the sizes separately, so they can be accessed after the send?

@gregw
Copy link
Contributor

gregw commented Nov 14, 2021

@lachlan-roberts thoughts?

@gregw
Copy link
Contributor

gregw commented Nov 15, 2021

Note also, that the Frame API is used for both sending and receiving, plus it has text frames. So things like: frame.setPayload("HelloWorld") can result in a frame that consumes the string.
Should methods like frame.getPayloadAsUTF8() consume?

Also I think I'd find it confusing if a Frame's payload was ever partially consumed. Let's say we have a Frame with 32KB bytes of payload. It's very clear what the frame header should be - i.e a data frame header with a length of 32KB. But let's say during the send, only half the payload is consumed before the transport blocks. We now have a frame with 16KB of payload.... which could look like it should be sent again wit ha new data frame header of length 16KB. But, that's not the case, it is the unsent trailing part of the previous sent 32KB header, so a header does not need to be generated for the frame again. Such transient states for the frame class are nonsensical and could only be resolved by making the frame class stateful and aware it is being sent.

I think it is far simpler for the frame to be immutable and for the underlying transport to take a slice of the payload buffer that it can pass to a gather write, which can then consume the buffer at its leasure without making the frame instance inconsistent.

So I think I have a firm preference for immutable frames.... but that preference itself is not immutable and I could be convinced by a compelling efficiency argument.

@lachlan-roberts
Copy link
Contributor

lachlan-roberts commented Nov 15, 2021

We should also consider this from the Jetty/Javax WebSocket API level which what was being discussed in #7088.

@OnMessage
public void onMessage(Session session, ByteBuffer message) throws Exception
{
    session.getBasicRemote().sendBinary(message);
    // Should the message buffer now be fully consumed?
}

I have written a test for this which runs over Jetty, Tomcat & Undertow.

  • Jetty does not modify the buffer.
  • Tomcat does not modify the buffer.
  • Undertow sets position to 0 and the limit to the capacity, which increased the remaining to max capacity of the buffer.

Not sure why this is being done by undertow, but I guess the key point is that these other JSR356 implementations are not consuming the ByteBuffers that are sent.

@lachlan-roberts
Copy link
Contributor

Actually it seems both tomcat and undertow do the same thing, it just happens that tomcat had a buffer of exactly the same size as the message, and it looks like undertow was using some pooled buffer so had a capacity larger than the message.

@gregw
Copy link
Contributor

gregw commented Nov 15, 2021

@lachlan-roberts I think that indicates that both Tomcat & Undertow are doing a buffer.clear(), which puts the buffer back into fill mode. So that's something different yet again.

Eitherway, that is an API with takes a ByteBuffer directly, which is a little different to an API that takes an entity that has-a ByteBuffer

@sbordet
Copy link
Contributor Author

sbordet commented Nov 15, 2021

@gregw I think the buffer should be consumed.

try (FileChannel channel = ...) {
  ByteBuffer buffer = ByteBuffer.allocate(1024);
  while (true) {
    buffer.clear();
    channel.read(buffer);
    buffer.flip();

    // Either WebSocket using the buffer directly.
    session.getRemote().sendBinary(buffer);
    // Expect buffer to be consumed here (or when the callback is complete).

    // Or HTTP/2. wrapping the buffer into another object.
    stream.sendData(new DataFrame(stream.getId(), buffer));
    // Expect buffer to be consumed here (or when the callback is complete).
  }
}

From the point of view of the while loop, it should not matter whether I'm using a "direct" API or wrap the ByteBuffer into another object: I am producing a buffer that should be consumed by a send operation, and it does not matter if the implementation aggregates it, splits it, slices it, or pass it immediately to a Channel.write(ByteBuffer) that would consume it.

The "don't consume it so that we can resend it" seems a weak argument to me, as we are creating frames on the fly for a one-shot operation, and we don't keep frames around in case we want to resend them, so I don't think we should keep ByteBuffers around either.

It just seems weird that we have to do some extra work to keep the ByteBuffer indexes untouched when it's natural that they get moved because the buffer is consumed.

We never use InputStream.mark() to rewind an InputStream in case we want to resend it.

Caching cannot be performed when ByteBuffers come from external sources. HttpContent being an implementation class is therefore not a good example of API behavior that you want to expose to users, like WebSocket's sendBinary(ByteBuffer), since in the HttpContent case the implementation controls the ByteBuffer, but not in the WebSocket case.

Perhaps moving a couple of indexes is not such a big work, but I don't see the reason why we need to do it at all?
Just take the ByteBuffer and consume it, whether by aggregating it, or compressing it, or writing it directly seems simpler and more natural to me, and never worry about restoring any index.

@gregw
Copy link
Contributor

gregw commented Nov 15, 2021

@sbordet
I don't understand the point of your example. Your loop clears the buffer at the start of every iteration, so it doesn't matter if it is consume or not. Even if the buffer is consumed, it needs to be cleared to be reused in that example, so there is zero difference in the code if the buffer is consumed or not. I think your example is showing that it doesn't really matter either way in most cases. I agree that there is the extra work of a slice on every send if we don't consume the buffer from a frame, but I think that is a fairly cheap operation.

For me, it's not so much the "don't consume it so that we can resend it" argument, but rather the "a partially or fully consumed frame makes no sense" that resonates with me... which might be resolved with better naming.
I think the naming in h2 is OK as DataFrame has:

    public int remaining()
    {
        return data.remaining();
    }

Which makes it clear that it is not the data frame size, but how many bytes not yet sent.

However for websocket it has:

    public int getPayloadLength()
    {
        return payload.remaining();
    }

which is saying what the payload length is, which does change just because the frame has been sent or is being sent. I'm not sure if we ever call getPayloadLength() after a frame has been sent?

I guess the whole pass a buffer by reference through multiple layers is pretty fragile no matter what.... even if lower layers slice, upper layers can still change the content. The upper layers just have to know that once they call send any buffers passed can't be messed with until the callback or unblocking.

So I can see it a bit both ways. The most important thing is that we clearly understand what we are doing and well document it. We don't want applications slicing when they don't have to, or not slicing when they should. The next most important thing is consistency.... WebSocket and H2 have been done differently, so we have to weigh up the issues of changing vs being consistent. It's probably not a big deal eitherway, as I don't think any app can mutate a buffer during a send. If the buffer is not consumed, then that allows constant message to be resent without a slice.... but if the buffer is consumed, then constant message must be sliced before making the buffer.

@github-actions
Copy link

This issue has been automatically marked as stale because it has been a
full year without activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the Stale For auto-closed stale issues and pull requests label Nov 16, 2022
@github-actions
Copy link

This issue has been closed due to it having no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side Stale For auto-closed stale issues and pull requests
Projects
None yet
Development

No branches or pull requests

3 participants