Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "double socket close" issue with Windows version of TCPConnection #4437

Merged
merged 1 commit into from
Sep 12, 2023

Conversation

SeanTAllen
Copy link
Member

@SeanTAllen SeanTAllen commented Sep 5, 2023

When fixing a number of "smaller" Windows TCP networking issues a couple years
back, in addition to fixing those issues, we introduced a new bug. That bug
lingered for two years. It lingered in large part because it would only become
apparent in a low resource environment.

When we recently switched our Windows CI from CirrusCI to GitHub Actions, we
went from a high-resource environment to a low-resource environment and started
getting a ton of "random" Windows TCP test failures.

The problem that was fairly easy to recreate in a test environment would be
fairly unlikely in most applications but existed nontheless. The scenario in
our test environment was like this:

  • Test 1 runs and completes but hasn't done test teardown yet
  • Test 2 starts up
  • Test 1 runs the "buggy" line of code and closes the socket it has been using
    with OS, but doesn't reset its own internal record of the file descriptor
    for the socket.
  • Test 2 is gets a socket from the OS with the file descriptor for the socket
    just closed by Test 1
  • Test 1 still has a "valid" file descriptor and as part of full shutdown,
    closes the socket associated with "its" file descriptor. When Test 1 does
    this, test 2's socket closes and the test fails to complete successfully.

The problem would appear "in the wild" if a Windows application was quickly
closing and opening TCP sockets in a manner similiar to the Pony standard
library TCP tests.

Fixes #4413
Fixes #4435

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Sep 5, 2023
@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Sep 5, 2023
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Sep 6, 2023
When fixing a number of "smaller" Windows TCP networking issues a couple years
back, in addition to fixing those issues, we introduced a new bug. That bug
lingered for two years. It lingered in large part because it would only become
apparent in a low resource environment.

When we recently switched our Windows CI from CirrusCI to GitHub Actions, we
went from a high-resource environment to a low-resource environment and started
getting a ton of "random" Windows TCP test failures.

The problem that was fairly easy to recreate in a test environment would be
fairly unlikely in most applications but existed nontheless. The scenario in
our test environment was like this:

- Test 1 runs and completes but hasn't done test teardown yet
- Test 2 starts up
- Test 1 runs the "buggy" line of code and closes the socket it has been using
    with OS, but doesn't reset its own internal record of the file descriptor
    for the socket.
- Test 2 is gets a socket from the OS with the file descriptor for the socket
    just closed by Test 1
- Test 1 still has a "valid" file descriptor and as part of full shutdown,
    closes the socket associated with "its" file descriptor. When Test 1 does
    this, test 2's socket closes and the test fails to complete successfully.

The problem would appear "in the wild" if a Windows application was quickly
closing and opening TCP sockets in a manner similiar to the Pony standard
library TCP tests.

Fixes #4413
Fixes #4435
@SeanTAllen SeanTAllen added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Sep 7, 2023
@SeanTAllen SeanTAllen requested a review from a team September 7, 2023 13:11
@SeanTAllen SeanTAllen marked this pull request as ready for review September 7, 2023 13:11
@SeanTAllen SeanTAllen merged commit 9cb6b8b into main Sep 12, 2023
23 checks passed
@SeanTAllen SeanTAllen deleted the minimal-windows-tcp-fix branch September 12, 2023 18:11
@ponylang-main ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Sep 12, 2023
github-actions bot pushed a commit that referenced this pull request Sep 12, 2023
github-actions bot pushed a commit that referenced this pull request Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows CI sometimes hangs Windows TCP tests are very flakey
3 participants