-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple games miss wakeups from PulseEvent (Crysis (1) is very slow to play intro videos, Sonic Generations and Pro Evolution Soccer have audio distortion) #10
Comments
There's no reason these should be global, and in particular, this means that esync_pulse_event() might end up writing 0, which raises the likelihood of a missed wakeup from "probable" to "certain". Fixes #10.
Fixed by a7faa85. |
sadly not fixed for me, using the 32 bit crysis executable. |
This isn't fixed (and I'm not sure why we thought it was). I can reproduce it even after a7faa85. The game seems to use PulseEvent() as a makeshift timer or semaphore; not quite sure. In esync this is implemented as a write() + read(). I think what's happening here is that both of these go through to the kernel before the polling thread has time to return success (i.e. it wakes up the polling thread, but the count is 0 by the time we get to eventfd_poll()). But this is a guess, since I don't understand very well how the kernel-side code actually works, and unfortunately I can't reproduce the bug with strace to rule out that possibility. In any case, if that's what's going wrong, this is a WONTFIX, and the correct solution is "just disable esync". It would kind of surprise me if the game benefits greatly from it anyway. |
I'm taking back my statement. It's indeed not fixed. I'm not sure what happened during testing yesterday as I can reproduce the issue with supposedly the exact same wine version today (even though it's been rebuilt in the meantime). |
It does indeed look like this is a kernel insufficiency: https://gist.github.com/zfigura/6f55688732728d7e1e452188014ec523 On my system the thread never wakes up from poll(), unless I add some sort of delay between the write() and read() calls. |
Ok that's interesting.. That also explains my results (was running an experimental kernel at the time). I'll try to use the exact same configuration next time :D |
May help with #10, although the real fix there is just not to use esync.
For what it's worth, I think all of these games are going through timeSetEvent(). Not sure that this helps us, though. |
There's no reason these should be global, and in particular, this means that esync_pulse_event() might end up writing 0, which raises the likelihood of a missed wakeup from "probable" to "certain". Fixes #10.
May help with #10, although the real fix there is just not to use esync.
There's no reason these should be global, and in particular, this means that esync_pulse_event() might end up writing 0, which raises the likelihood of a missed wakeup from "probable" to "certain". Fixes #10.
May help with #10, although the real fix there is just not to use esync.
I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data. However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter. |
I'm not sure what's meant by this. One eventfd per process? Per handle? That can't work, we need events to be able to be woken up from different processes and handles.
Waking up multiple waiters on an auto-reset event seems extremely risky. Programs could easily be using them as a stupid semaphore or mutex. |
One eventfd per waiter process per handle.
Oh, you didn't specify auto-reset. Then just set EFD_SEMAPHORE and use eventfd in the regular way? i.e. write the data, then instead of immediately reading it, let the client consume it? |
How would that work, then? How would you wake up handle A by signaling handle B to the same object, if they are using different eventfds?
To have PulseEvent() behave like SetEvent()? That seems too risky. |
Yeah, that doesn't actually fix PulseEvent at all, my mistake.
Ehm... I thought too hard about this issue and mixed up PulseEvent with SetEvent. I think it is possible to implement if there is only WaitForSingleObject. First, define an atomic counter on each auto-reset esync primitive, call it W. PulseEvent:
WaitForSingleObject:
It is possible to lose PulseEvent if there are multiple waiters and PulseEvent is used in quick succession. I think it is probably possible to fix this by checking the value of W, or adding another counter. I tried to think about it but couldn't find any way to make mixed WaitForMultipleObjects work; I think you need two atomic counters, or possibly to lock the event with a mutex (maybe only if PulseEvent is in use)? Is it common to use WaitForMultipleObjects with PulseEvent? |
Your idea is essentially to signal the event if and only if there are waiters, and not to unsignal it in that case. This could work, but would be difficult to synchronize, and it would break with wait-all. I'll have to give it some thought. |
I think if it's possible to get it to work with wait-multiple, the same tactic should apply with wait-all. Personally, I don't really care about this issue, but if you say it's blocking upstreaming then I'm interested in helping. |
PulseEvent() is tricky, but the biggest problem with upstreaming is, I think, wait-all. That's not something that's exposed by Linux polling APIs, and I think it would probably need either kernel support or excessive locking on the Wine side. The other problem is use of shared memory (reading object state, reading object state atomically with releasing a semaphore). |
I don't see the problem with the existing wait-all algorithm, except that it potentially breaks ordering in some specific cases. Shared memory is also highly portable, even Windows has it. I don't understand what you mean by "volatile state of semaphores and mutexes", but if you need a shared memory across all processes on the same WINEPREFIX, just open a shared file in that prefix? |
I'm not sure about ordering, but I guess the main problem is that we can grab objects incorrectly, and then they're unsignaled for a period of time until we release them.
As in the owner thread, recursion count, etc. The problem is not portability but rather safety; we can't let one process corrupt the state of another process's objects, even by accident. I think the proper way forward is to get support for these things in the kernel. It just needs a lot of work. |
I thought about this and I think you want a RW lock, where W is wait-all and R is clearing a single event (whether by WaitForSingleObject or single WaitForMultipleObjects). So it would look like: wait-any:
wait-all:
reset:
I think this is probably the most efficient possible implementation. I'm pretty sure it's not possible to implement in a completely lock-free manner. I tried reading the ReactOS code, but I didn't really understand it. |
May help with zfigura/wine#10, although the real fix there is just not to use esync.
There's no reason these should be global, and in particular, this means that esync_pulse_event() might end up writing 0, which raises the likelihood of a missed wakeup from "probable" to "certain". Fixes: zfigura/wine#10
May help with zfigura/wine#10, although the real fix there is just not to use esync.
May help with #10, although the real fix there is just not to use esync.
Just stumbled over this. I guess it can't hurt to report that this is still an issue with both esync and fsync, Crysis 1 demo freezes after starting (wine-staging-tkg ff10ae6e74a8f090f89a217e0ff6da862b6b022b ). |
With esync enabled, Crysis seemingly hangs on a black screen. Sometimes, after a while, the first frame of the opening video will show. 30-60 seconds later the second frame may or may not appear, ultimately ending up in what looks like a freeze.
Here's a log : https://drive.google.com/file/d/1vPyCKEb-FwH9otxYMFJWguwj64z9RPPq/view?usp=sharing
The text was updated successfully, but these errors were encountered: