Skip to content

Commit

Permalink
Fix "double socket close" issue with Windows version of TCPConnection (
Browse files Browse the repository at this point in the history
…#4437)

When fixing a number of "smaller" Windows TCP networking issues a couple years
back, in addition to fixing those issues, we introduced a new bug. That bug
lingered for two years. It lingered in large part because it would only become
apparent in a low resource environment.

When we recently switched our Windows CI from CirrusCI to GitHub Actions, we
went from a high-resource environment to a low-resource environment and started
getting a ton of "random" Windows TCP test failures.

The problem that was fairly easy to recreate in a test environment would be
fairly unlikely in most applications but existed nontheless. The scenario in
our test environment was like this:

- Test 1 runs and completes but hasn't done test teardown yet
- Test 2 starts up
- Test 1 runs the "buggy" line of code and closes the socket it has been using
    with OS, but doesn't reset its own internal record of the file descriptor
    for the socket.
- Test 2 is gets a socket from the OS with the file descriptor for the socket
    just closed by Test 1
- Test 1 still has a "valid" file descriptor and as part of full shutdown,
    closes the socket associated with "its" file descriptor. When Test 1 does
    this, test 2's socket closes and the test fails to complete successfully.

The problem would appear "in the wild" if a Windows application was quickly
closing and opening TCP sockets in a manner similiar to the Pony standard
library TCP tests.

Fixes #4413
Fixes #4435
  • Loading branch information
SeanTAllen committed Sep 12, 2023
1 parent 9d7360b commit 9cb6b8b
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 5 deletions.
15 changes: 15 additions & 0 deletions .release-notes/4437.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## Fix "double socket close" issue with Windows version of TCPConnection

When fixing a number of "smaller" Windows TCP networking issues a couple years back, in addition to fixing those issues, we introduced a new bug. That bug lingered for two years. It lingered in large part because it would only become apparent in a low resource environment.

When we recently switched our Windows CI from CirrusCI to GitHub Actions, we went from a high-resource environment to a low-resource environment and started getting a ton of "random" Windows TCP test failures.

The problem that was fairly easy to recreate in a test environment would be fairly unlikely in most applications but existed nontheless. The scenario in our test environment was like this:

- Test 1 runs and completes but hasn't done test teardown yet
- Test 2 starts up
- Test 1 runs the "buggy" line of code and closes the socket it has been using with OS, but doesn't reset its own internal record of the file descriptor for the socket.
- Test 2 is gets a socket from the OS with the file descriptor for the socket just closed by Test 1
- Test 1 still has a "valid" file descriptor and as part of full shutdown, closes the socket associated with "its" file descriptor. When Test 1 does this, test 2's socket closes and the test fails to complete successfully.

The problem would appear "in the wild" if a Windows application was quickly closing and opening TCP sockets in a manner similiar to the Pony standard library TCP tests.
6 changes: 1 addition & 5 deletions packages/net/tcp_connection.pony
Original file line number Diff line number Diff line change
Expand Up @@ -1059,11 +1059,7 @@ actor TCPConnection is AsioEventNotify
_shutdown = true

if _connected then
ifdef windows then
@pony_os_socket_close(_fd)
else
@pony_os_socket_shutdown(_fd)
end
@pony_os_socket_shutdown(_fd)
else
_shutdown_peer = true
end
Expand Down

0 comments on commit 9cb6b8b

Please sign in to comment.