Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple games miss wakeups from PulseEvent (Crysis (1) is very slow to play intro videos, Sonic Generations and Pro Evolution Soccer have audio distortion) #10

Closed
Tk-Glitch opened this issue Aug 14, 2018 · 19 comments
Labels
wontfix This will not be worked on

Comments

@Tk-Glitch
Copy link

With esync enabled, Crysis seemingly hangs on a black screen. Sometimes, after a while, the first frame of the opening video will show. 30-60 seconds later the second frame may or may not appear, ultimately ending up in what looks like a freeze.

Here's a log : https://drive.google.com/file/d/1vPyCKEb-FwH9otxYMFJWguwj64z9RPPq/view?usp=sharing

zfigura added a commit that referenced this issue Aug 15, 2018
There's no reason these should be global, and in particular, this means that esync_pulse_event() might end up writing 0, which raises the likelihood of a missed wakeup from "probable" to "certain".

Fixes #10.
@zfigura
Copy link
Owner

zfigura commented Aug 15, 2018

Fixed by a7faa85.

@zfigura zfigura closed this as completed Aug 15, 2018
@pingubot
Copy link

pingubot commented Aug 15, 2018

sadly not fixed for me, using the 32 bit crysis executable.

@zfigura zfigura reopened this Aug 15, 2018
@zfigura zfigura added the wontfix This will not be worked on label Aug 15, 2018
@zfigura
Copy link
Owner

zfigura commented Aug 15, 2018

This isn't fixed (and I'm not sure why we thought it was). I can reproduce it even after a7faa85.

The game seems to use PulseEvent() as a makeshift timer or semaphore; not quite sure. In esync this is implemented as a write() + read(). I think what's happening here is that both of these go through to the kernel before the polling thread has time to return success (i.e. it wakes up the polling thread, but the count is 0 by the time we get to eventfd_poll()). But this is a guess, since I don't understand very well how the kernel-side code actually works, and unfortunately I can't reproduce the bug with strace to rule out that possibility.

In any case, if that's what's going wrong, this is a WONTFIX, and the correct solution is "just disable esync". It would kind of surprise me if the game benefits greatly from it anyway.

@zfigura zfigura changed the title Crysis (1) hanging Crysis (1) is very slow to play intro videos [missed wakeups from PulseEvent()] Aug 15, 2018
@Tk-Glitch
Copy link
Author

I'm taking back my statement. It's indeed not fixed. I'm not sure what happened during testing yesterday as I can reproduce the issue with supposedly the exact same wine version today (even though it's been rebuilt in the meantime).

@zfigura
Copy link
Owner

zfigura commented Aug 15, 2018

It does indeed look like this is a kernel insufficiency: https://gist.github.com/zfigura/6f55688732728d7e1e452188014ec523

On my system the thread never wakes up from poll(), unless I add some sort of delay between the write() and read() calls.

@Tk-Glitch
Copy link
Author

Ok that's interesting.. That also explains my results (was running an experimental kernel at the time). I'll try to use the exact same configuration next time :D

zfigura added a commit that referenced this issue Aug 19, 2018
May help with #10, although the real fix there is just not to use esync.
@zfigura zfigura changed the title Crysis (1) is very slow to play intro videos [missed wakeups from PulseEvent()] Multiple games miss wakeups from PulseEvent (Crysis (1) is very slow to play intro videos, Sonic Generations and Pro Evolution Soccer have audio distortion) Sep 5, 2018
@zfigura
Copy link
Owner

zfigura commented Sep 5, 2018

For what it's worth, I think all of these games are going through timeSetEvent(). Not sure that this helps us, though.

zfigura added a commit that referenced this issue Oct 7, 2018
There's no reason these should be global, and in particular, this means that esync_pulse_event() might end up writing 0, which raises the likelihood of a missed wakeup from "probable" to "certain".

Fixes #10.
zfigura added a commit that referenced this issue Oct 7, 2018
May help with #10, although the real fix there is just not to use esync.
zfigura added a commit that referenced this issue Nov 1, 2018
There's no reason these should be global, and in particular, this means that esync_pulse_event() might end up writing 0, which raises the likelihood of a missed wakeup from "probable" to "certain".

Fixes #10.
zfigura added a commit that referenced this issue Nov 1, 2018
May help with #10, although the real fix there is just not to use esync.
@Hello71
Copy link

Hello71 commented Dec 12, 2018

I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data. However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter.

@zfigura
Copy link
Owner

zfigura commented Dec 12, 2018

I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data.

I'm not sure what's meant by this. One eventfd per process? Per handle? That can't work, we need events to be able to be woken up from different processes and handles.

However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter.

Waking up multiple waiters on an auto-reset event seems extremely risky. Programs could easily be using them as a stupid semaphore or mutex.

@Hello71
Copy link

Hello71 commented Dec 12, 2018

I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data.

I'm not sure what's meant by this. One eventfd per process? Per handle? That can't work, we need events to be able to be woken up from different processes and handles.

One eventfd per waiter process per handle.

However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter.

Waking up multiple waiters on an auto-reset event seems extremely risky. Programs could easily be using them as a stupid semaphore or mutex.

Oh, you didn't specify auto-reset. Then just set EFD_SEMAPHORE and use eventfd in the regular way? i.e. write the data, then instead of immediately reading it, let the client consume it?

@zfigura
Copy link
Owner

zfigura commented Dec 12, 2018

I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data.

I'm not sure what's meant by this. One eventfd per process? Per handle? That can't work, we need events to be able to be woken up from different processes and handles.

One eventfd per waiter process per handle.

How would that work, then? How would you wake up handle A by signaling handle B to the same object, if they are using different eventfds?

However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter.

Waking up multiple waiters on an auto-reset event seems extremely risky. Programs could easily be using them as a stupid semaphore or mutex.

Oh, you didn't specify auto-reset. Then just set EFD_SEMAPHORE and use eventfd in the regular way? i.e. write the data, then instead of immediately reading it, let the client consume it?

To have PulseEvent() behave like SetEvent()? That seems too risky.

@Hello71
Copy link

Hello71 commented Dec 12, 2018

How would that work, then? How would you wake up handle A by signaling handle B to the same object, if they are using different eventfds?

Yeah, that doesn't actually fix PulseEvent at all, my mistake.

To have PulseEvent() behave like SetEvent()? That seems too risky.

Ehm... I thought too hard about this issue and mixed up PulseEvent with SetEvent. I think it is possible to implement if there is only WaitForSingleObject. First, define an atomic counter on each auto-reset esync primitive, call it W.

PulseEvent:

  1. if E is an auto-reset event and E.W > 0, then write(E, 1).
  2. if E is a manual-reset event, then write(E, 1).

WaitForSingleObject:

  1. if E is a manual-reset event, do as before. otherwise:
  2. increment E.W.
  3. poll on E.
  4. decrement E.W.
  5. non-blocking read(E). if successful, return. else, go to 2.

It is possible to lose PulseEvent if there are multiple waiters and PulseEvent is used in quick succession. I think it is probably possible to fix this by checking the value of W, or adding another counter.

I tried to think about it but couldn't find any way to make mixed WaitForMultipleObjects work; I think you need two atomic counters, or possibly to lock the event with a mutex (maybe only if PulseEvent is in use)? Is it common to use WaitForMultipleObjects with PulseEvent?

@zfigura
Copy link
Owner

zfigura commented Dec 12, 2018

Your idea is essentially to signal the event if and only if there are waiters, and not to unsignal it in that case. This could work, but would be difficult to synchronize, and it would break with wait-all. I'll have to give it some thought.

@Hello71
Copy link

Hello71 commented Dec 13, 2018

I think if it's possible to get it to work with wait-multiple, the same tactic should apply with wait-all. Personally, I don't really care about this issue, but if you say it's blocking upstreaming then I'm interested in helping.

@zfigura
Copy link
Owner

zfigura commented Dec 13, 2018

PulseEvent() is tricky, but the biggest problem with upstreaming is, I think, wait-all. That's not something that's exposed by Linux polling APIs, and I think it would probably need either kernel support or excessive locking on the Wine side. The other problem is use of shared memory (reading object state, reading object state atomically with releasing a semaphore).

@Hello71
Copy link

Hello71 commented Dec 13, 2018

I don't see the problem with the existing wait-all algorithm, except that it potentially breaks ordering in some specific cases. Shared memory is also highly portable, even Windows has it. I don't understand what you mean by "volatile state of semaphores and mutexes", but if you need a shared memory across all processes on the same WINEPREFIX, just open a shared file in that prefix?

@zfigura
Copy link
Owner

zfigura commented Dec 13, 2018

I don't see the problem with the existing wait-all algorithm, except that it potentially breaks ordering in some specific cases.

I'm not sure about ordering, but I guess the main problem is that we can grab objects incorrectly, and then they're unsignaled for a period of time until we release them.

Shared memory is also highly portable, even Windows has it. I don't understand what you mean by "volatile state of semaphores and mutexes", but if you need a shared memory across all processes on the same WINEPREFIX, just open a shared file in that prefix?

As in the owner thread, recursion count, etc. The problem is not portability but rather safety; we can't let one process corrupt the state of another process's objects, even by accident.

I think the proper way forward is to get support for these things in the kernel. It just needs a lot of work.

@Hello71
Copy link

Hello71 commented Dec 14, 2018

I don't see the problem with the existing wait-all algorithm, except that it potentially breaks ordering in some specific cases.

I'm not sure about ordering, but I guess the main problem is that we can grab objects incorrectly, and then they're unsignaled for a period of time until we release them.

I thought about this and I think you want a RW lock, where W is wait-all and R is clearing a single event (whether by WaitForSingleObject or single WaitForMultipleObjects). So it would look like:

wait-any:

  1. poll
  2. take rdlock
  3. do the needful
  4. release rdlock

wait-all:

  1. poll
  2. take wrlock
  3. poll again with timeout=0. if not all readable, release wrlock and go to 1.
  4. do the needful
  5. release wrlock

reset:

  1. take rdlock
  2. do the needful
  3. release rdlock

I think this is probably the most efficient possible implementation. I'm pretty sure it's not possible to implement in a completely lock-free manner. I tried reading the ReactOS code, but I didn't really understand it.

kakra pushed a commit to kakra/wine-proton that referenced this issue Mar 9, 2019
May help with zfigura/wine#10, although the real
fix there is just not to use esync.
kakra pushed a commit to kakra/wine-proton that referenced this issue Mar 17, 2019
There's no reason these should be global, and in particular, this means that
esync_pulse_event() might end up writing 0, which raises the likelihood of
a missed wakeup from "probable" to "certain".

Fixes: zfigura/wine#10
kakra pushed a commit to kakra/wine-proton that referenced this issue Mar 17, 2019
May help with zfigura/wine#10, although the real
fix there is just not to use esync.
zfigura added a commit that referenced this issue Jul 29, 2019
May help with #10, although the real fix there is just not to use esync.
@aufkrawall
Copy link

Just stumbled over this. I guess it can't hurt to report that this is still an issue with both esync and fsync, Crysis 1 demo freezes after starting (wine-staging-tkg ff10ae6e74a8f090f89a217e0ff6da862b6b022b ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants