Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data read with end of SSL handshake is discarded #665

Closed
cstratton opened this issue Feb 12, 2018 · 12 comments · Fixed by #943
Closed

Data read with end of SSL handshake is discarded #665

cstratton opened this issue Feb 12, 2018 · 12 comments · Fixed by #943

Comments

@cstratton
Copy link

cstratton commented Feb 12, 2018

Description

Essentially, the bug is that

CONCLUDE SSL HANDSHAKE
PLEASE CAN WE DO WEBSOCKETS

works, while

CONCLUDE SSL HANDSHAKE PLEASE CAN WE DO WEBSOCKETS

does not

Expected Behavior

Data following the conclusion of the SSL handshake should be processed, regardless of how quickly it follows the handshake.

Websockets operating over SSL are ultimately using TCP, which is a streaming protocol, meaning that a given read() call will generally return all unclaimed data which has arrived, regardless of any logical boundaries, origin in distinct write() calls, or breaks between packets.

When operating as a server, the final read() call in processHandshake() in SSLSocketChannel2.java often ends up claiming not only the concluding 45 or so bytes of the SSL handshake exchange, but also 300-something bytes of immediately following HTTP traffic - typically the initial Websocket upgrade request.

Current Behavior

Unfortunately, at least as of Java-WebSocket-1.3.0-334-g105e2e0 and preceding recent versions, any remaining data in the inCrypt buffer is dropped when processHandshake calls createBuffers(). This means that the websocket upgrade request contained within the dropped data buffer is never seen, and application programs on both ends fail to receive a connection callback.

Possible Solution

We've had initial success modifying createBuffers() to not rewind/flip inCrypt if it contains yet unused data. However, it's not clear if this is entirely safe or how it interacts with past bugfixes like that for #190. It would also seem that if the buffer size should change the data could need to be copied from the old one to the new.

We are also unsure if this should be restricted to the createBuffers() call resulting from the SSL handshake's read() operation. We only operate the library in the server role, and suspect that the SSL handshake completing during a write() would be more likely to occur in the complementary client role?

Steps to Reproduce (for bugs)

The problem is most obvious when a client (or the TCP stack of the client's operating system) combines both the final piece of the SSL handshake and the following initial protocol traffic within the same packet. Using daltoniam/Starscream on an iphone for example, initial connections seem to send these in distinct packets and the connection works. But if the websocket is disconnected and new connection attempt made soon thereafter, the final piece of the new connection's SSL handshake and its initial data seem to reliably end up in the same packet, as a result of this bug the portion following the SSL handshake is dropped by createBuffers(), and the connection fails to establish a Websocket. (Packet sniffing shows that the client's source port does increment, so this is not a TCP connection confusion. Additionally, distinct write() calls are actually being used for the SSL handshake vs. application data, but unless a few hundred milliseconds of delay in inserted in between, the operating system/TCP stack is merging them into a single packet for stream efficiency as the design of TCP encourages)

The problem can also theoretically occur if the client manages to fire off two distinct packets in between the server's read() calls, such as might happen if the server is heavily loaded or experiencing scheduler latency, or in the case of network level issues, especially as Nagle encourages sending an additional packet even with the ack of the previous still unreceived.

The situation could probably also be artificially caused by introducing a half-second sleep() before each read() call, such that all of the data the client intends to send at that point in the conversation will probably arrive before the read() can claim any of it.

Ultimately, communication over TCP needs to handle both the case of a read() call retrieving less than a logical chunk of data, and also the case of it retrieving the end of one logical chunk and all or part of another. Because of the buffer clearing, the latter case is not handled at the specific transition which occurs at the end of the SSL handshake.

Context

Android based secure websocket server is unable to reliably accept websocket connection attempts from clients behaving in accordance with the TCP specification.

Your Environment

Android 6.1 with Java-WebSocket-1.3.0-334-g105e2e0 operating in server role
Secure Websockets
Client using https://github.com/daltoniam/Starscream v 3.0.4 on iOS 11.0.2
both server and client on the same local wifi network

@marci4
Copy link
Collaborator

marci4 commented Feb 12, 2018

Hello @cstratton,

thank you very much for this detailed bug report.

The issue you are describing is new to me, but I am more than welcome to take a look a this.

As it sounds, you have a patch for this issue ready. Could you also add it to this issue? Or is it just reverting #190?
Is it also possible to reproduce this issue in another way, e.g. in a desktop only environment?

Does this also occur with the latest version 1.3.7?

Greetings
marci4

@cstratton
Copy link
Author

cstratton commented Feb 14, 2018

Hi marci4, thanks for taking a look at this.

What we tried this against, https://github.com/TooTallNate/Java-WebSocket/commits/105e2e0 is actually newer than 1.3.7, for some reason that is unclear git-describe counted it from an earlier tag making it look older than it is, anyway I think the range of applicability is very wide.

What we did to verify this was the problem was essentially to not rewind and flip the input buffer if it it still contained unused data.

issue_665_crudefix2.txt
(new version - original was mixed up)

However I don't think that's really safe or sufficient, both as createBuffers() can be called from multiple places, and can create new buffers rather than just rewind the existing ones.

Connection to #190 is more theoretical essentially a fear that fixing this might reintroduce issues in that related code. Or maybe not.

Essentially I think what is needed is a more sophisticated and safe way of accomplishing the preservation of the unused buffer contents, in the situation where it is needed and appropriate.

Reproducing the problem in a portable way may be challenging - we've only tried the library so far in the context of our Android server APK and iOS app. My guess is that the easiest way to get it to occur in a different context could be the artificial one of putting in sleep delays before the reads. Is there an existing test case we could maybe fork and extend with that in an attempt to make a somewhat repeatable demonstration?

@marci4
Copy link
Collaborator

marci4 commented Feb 16, 2018

Hello @cstratton,

Yeah I got confused with the tag there!

Well I fear that I have to rewrite the whole SSLSocketChannel to have a more maintainable solution here, since using your fix causes other problems e.g. with firefox.

I will however be on holidays till next friday so I will not be able to do this in the next week!

If I have a somehow working solution, can I get back to you and let you test this?

Greetings
marci4

@cstratton
Copy link
Author

cstratton commented Feb 16, 2018

Thanks. While I didn't think the simple approaching of not rewinding the buffer would be safe or effective without additional wrapping logic to figure out when and how it should apply (or when other means might be needed, for example to copy contents from a buffer being replaced to the replacement), I'd be curious as to how it is failing with firefox.

@marci4
Copy link
Collaborator

marci4 commented Feb 24, 2018

Well the problem is that for me the SSLSocketChannel is just a mess. The author is not active any more and it has some known issues.
That is the main reason why I wanna rewrite it...

@a-ross-cohen
Copy link

Hey @marci4 I seem to be running into the same issue as @cstratton. Is there any update on a timeline to a 'correct' fix for this?

@marci4
Copy link
Collaborator

marci4 commented Mar 7, 2018

Hello @a-ross-cohen,

well I do not have the time for such a big project right now! Being busy with my master thesis and my full time job.

In simple words, we "just" need a class that implements a ByteChannel interface over a TLS connection using a SSLEngine.

Any help is much appreciated!

Greetins
marci4

@doyledavidson
Copy link
Contributor

doyledavidson commented Sep 6, 2019

I just stumbled into this bug with TLSv1.3 (using JDK 11) and sslParameters.setNeedClientAuth(true);
I am using a simple EchoClient / EchoServer.
Most of the time, but not always, the HTTP GET is pulled in with the "SSLEngineInputRecord.java:177|Raw read", and since it is discarded, the client hangs and the server never reads the data.

I hacked createBuffers() with the following to confirm but do not fully yet understand the whole context of calling this a second time. Let me study this bug and the code some more and see if I can suggest a better solution.
if ( inCrypt.remaining() == 0 || bufferallocations == 0)
{
inCrypt.rewind();
inCrypt.flip();
}

Interestingly I don't hit this with TLSv1.2 mode and all of the other same code.

@doyledavidson
Copy link
Contributor

I think I have a tactical fix for this that I have tested in my own branch and my test code and real application.

I am new to "git" but think I have done this mostly right. My change is shown in my fork at https://github.com/doyledavidson/Java-WebSocket

I can create a pull request from that if you like.

Feedback welcome!

@da-als
Copy link
Contributor

da-als commented Nov 14, 2019

I think I have a tactical fix for this that I have tested in my own branch and my test code and real application.

I am new to "git" but think I have done this mostly right. My change is shown in my fork at https://github.com/doyledavidson/Java-WebSocket

I can create a pull request from that if you like.

Feedback welcome!

I am currently in the process of testing this fix on my application. Before the fix, several Google Chrome users were having problems connecting to the websocket server over WSS; the changes in @doyledavidson's fork seem to correct that. We're gradually deploying the fix to more clients over the next few weeks to ensure there are no side effects and will likely be using a fork with these changes while we wait for an official fix to hit the main Java-WebSocket repository.

@marci4
Copy link
Collaborator

marci4 commented Nov 15, 2019

I think I have a tactical fix for this that I have tested in my own branch and my test code and real application.
I am new to "git" but think I have done this mostly right. My change is shown in my fork at doyledavidson/Java-WebSocket
I can create a pull request from that if you like.
Feedback welcome!

I am currently in the process of testing this fix on my application. Before the fix, several Google Chrome users were having problems connecting to the websocket server over WSS; the changes in @doyledavidson's fork seem to correct that. We're gradually deploying the fix to more clients over the next few weeks to ensure there are no side effects and will likely be using a fork with these changes while we wait for an official fix to hit the main Java-WebSocket repository.

@da-als if it works, feel free to open a pull request

@da-als
Copy link
Contributor

da-als commented Nov 15, 2019

@da-als if it works, feel free to open a pull request

I have zero knowledge and experience in collaborating on these GitHub projects but I have created a pull request with as much information as I can come up with.

@marci4 marci4 added this to the Release 1.4.1 milestone Jan 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants