Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strengthen the ordering for some atomic operations #4083

Merged
merged 6 commits into from
Apr 2, 2022
Merged

Conversation

SeanTAllen
Copy link
Member

@SeanTAllen SeanTAllen commented Apr 2, 2022

This commit undoes the changes introduced in 2016 in
#1209.

While we don't know that every change in #1209, we do know that a number of the
changes are the source of incorrect operations on Arm where we see evidence of
"bad things" like actors being run by more than 1 scheduler at a time.

Matt Parkinson from MSR Cambridge identified a number of issues with compare
and swap on Arm that Joe Eli McIlvain then found were introduced in #1209. The
issues would not impact x86, only CPUs with weaker memory models like Arm.

Given that we know that some of the changes in #1209 are wrong, Joe and I
decided that the best course of action is to roll them all back. While all
might not need to be rollback, we are following the Pony philosophy and valuing
correctness over performance.

We have much better testing now then we did 5 years ago and can put any future
relaxing of atomic order operations through a considerably more vigorous testing
process than we did years ago for #1209.

With these sorts of bugs, we can never say, they have been fixed. What we can
say is that we ran with Matt's suggested changes for over 48 hours without
encountering any crashes or other signs of incorrect behavior. As such, we
believe that these changes are an improvement but our nightly runtime stress
tests might find additional problems in the future.

It should be noted, that due to changes in the last 5 years, this is not an
exact reversion. I've done my best to make sure that all changes were reverted
as they would be seen in their current form today.

Closes #4069

@SeanTAllen SeanTAllen requested a review from a team April 2, 2022 12:25
@SeanTAllen SeanTAllen added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Apr 2, 2022
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Apr 2, 2022
@SeanTAllen
Copy link
Member Author

When this is merged, the top comment should be used as the commit message when squashing.

@SeanTAllen
Copy link
Member Author

SeanTAllen commented Apr 2, 2022

The changelog and release notes failures can be ignored. I opened this and then pushed a few changes quickly and we hit the secondary rate limits that GH has. Quicker than normal, but that is the cause, and we don't have any changelog or release notes changes in this to run.

@SeanTAllen SeanTAllen merged commit f94c1ec into main Apr 2, 2022
@SeanTAllen SeanTAllen deleted the mjp-idea branch April 2, 2022 16:04
@ponylang-main ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Apr 2, 2022
github-actions bot pushed a commit that referenced this pull request Apr 2, 2022
github-actions bot pushed a commit that referenced this pull request Apr 2, 2022
SeanTAllen added a commit that referenced this pull request Oct 28, 2022
In PR #4083, I reversed some relaxing of atomic orderings that were
introduced in 2016 in #1209. When I was doing that change, I missed
an additional strengthing that should have been done in the IOCP
subsystem.

If you note the original commit, I made a corresponding change in
the EPoll subsystem, but missed doing it in the IOCP system.
SeanTAllen added a commit that referenced this pull request Oct 28, 2022
In PR #4083, I reversed some relaxing of atomic orderings that were
introduced in 2016 in #1209. When I was doing that change, I missed
an additional strengthing that should have been done in the IOCP
subsystem.

If you note the original commit, I made a corresponding change in
the EPoll subsystem, but missed doing it in the IOCP system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aarch64 stress test failure
3 participants