Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix possible segfault with CachingReader::~CachingReader(); #24

Merged
merged 10 commits into from
Jul 3, 2013

Conversation

daschuer
Copy link
Member

Found this by checking Bug #1193853, This is not a fix for this bug, But this is a fix for a potential segfault when closing Mixxx.

@@ -905,6 +905,7 @@ void EngineBuffer::slotEjectTrack(double v) {
}
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these lines are no longer needed they should be removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used by the test code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not! Maybe this was from a test I never committed. It allows us to insert a mock CachingReader without having to pass CachingReader in to the EngineBuffer constructor. That's why I wrote it at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is actual currently totally unused and it puts noise on my race condition search of Bug #1193853.
Remove it or not? If this should remain, is there a macro, or an other convention to mark such code as "For testing purpose only"?

@rryan
Copy link
Member

rryan commented Jun 24, 2013

Can't we just delete the EngineWorkerScheduler (which blocks on the thread pool being done) before deleting the EngineChannels? Since we are shutting down anyway it would be fine for the EngineWorkers to hook the EngineWorkerScheduler destroyed() signal that is sent before the scheduler is deleted. That way they can clear their pointers to it.

@rryan
Copy link
Member

rryan commented Jun 24, 2013

Also, I think the EngineWorkerScheduler shutdown sequence of waitForDone / wait should be swapped so that the threadpool wait-for-done happens after the EWS thread shuts down (so there's no chance another runnable is queued after we've waited for the thread pool to be empty).

EDIT: Nevermind, QThreadPool::waitForDone removes all threads from the pool.

@daschuer
Copy link
Member Author

Thank you for your fast responds!

Of cause this is only a band aid for a architectural issue. It is for sure better to fix the real issue, but thats will easily become a middle size project :-(

Here some hinds:

  • CachingReader is quite big. It is hard to distinguish setup functions from the scheduled job functions. It would be nice if we could divide it and take finally the advantages of AutoDelete in QRunnable.
  • The whole thing of CachingReader/EngineWorkerScheduler/EngineWorker is coded universal, but most of this is currently unused. I would prefer to merge some classes, for a better maintenance.
  • Get rid of the EngineWorkerScheduler thread
  • I had issues on my poor hardware with silence parts. (Atom Netbook and External USB Hdd). I have tried to simulate it with a random sleep in CachingReader::run(). It would be nice, if we could at least track those underflows and have a less noticeable band aid. (Bug #1194021 )

I would be really glad if we could somehow fix Bug #1193853 as well, because this has actually braked my party.

Any Ideas?

@daschuer
Copy link
Member Author

It tooks me longer then I have first thought, to split up CachingReader. Now here it is. The Last commit should not change the behavior. Now we have a good base for fix other issues. But for now I am done. I am satisfied with my mutex solution. So IMHO ready to merge. @rryan: I have not fully understand your Idea, but I hope it is now more easy to you, to code it in.

@@ -500,6 +500,7 @@ def sources(self, build):
"engine/clockcontrol.cpp",
"engine/readaheadmanager.cpp",
"cachingreader.cpp",
"cachingreaderworker.cpp",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something's funky here w/ the indentation.

@rryan
Copy link
Member

rryan commented Jun 26, 2013

@daschuer -- thanks a ton for splitting these. I should have done that when I first wrote the EngineWorkerScheduler and switched CachingReader to doing its work in an engineworker instead of in the callback.

But I really don't like adding a mutex lock per deck/sampler/previewdeck per callback to the engine. The QAtomicInt method was working fine before. The fix for the race condition you saw is trivial -- just stop EngineWorkerScheduler first in the EngineMaster destructor. By definition that stops the thread pool and so no CachingReaderWorker can be running when the EngineChannels are destroyed.

BTW, I think the future of CachingReader is consolidating it into a single CachingReaderWorker for all CachingReaders. That way, decks with the same track loaded can share the same cache for that track rather than duplicating the audio decoding and caching work. This will be quite nice when we get fancier with sample decks with things like instant doubles or sending a loop to a sampler.

@@ -43,8 +39,7 @@ void EngineWorkerScheduler::run() {
while (!m_bQuit) {
EngineWorker* pWorker = NULL;
while (m_scheduleFIFO.read(&pWorker, 1) == 1) {
if (pWorker && !pWorker->isActive()) {
pWorker->setActive(true);
if (pWorker) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EngineWorkerScheduler previously would not schedule a job that had been set as active. Now that it does not have a way to check if the worker is active, it will queue the worker after EngineWorker::workReady() is called regardless of whether one is running or not. This means that the implementation of the EngineWorker has to enforce that workReady() only be called when the EngineWorker is not running already. You've done this in CachingReaderWorker with a mutex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. Is there an issue with this?
If there are no objection, the next logical step is to get rid of the EnginWorker and merge it to CachingReaderWorker.
Since there is only one kind of worker and it is unlikely that we will have an other kind of worker in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: issues with this -- please see the other comments I made.

RE: consolidating EngineWorker and CachingReaderWorker -- it's nice to have a general mechanism for queuing work to be run after the callback and it comes at no performance cost.

@rryan
Copy link
Member

rryan commented Jun 26, 2013

This change has a potential impact on reader scheduling for slow machines.

The time an EngineWorker is scheduled is different from the time that it is run. On a slow machine, it's possible these two events will be far apart.

This change makes it so that CachingReaderWorker cannot schedule itself until it is done working. In 1.11, you could imagine a chain of events like this:

  1. CachingReaderWorker (still running from when it was scheduled by the previous callback) begins decoding chunk of audio.
  2. Audio callback begins.
  3. EngineBuffer hints reader to read more chunks.
  4. CachingReader schedules itself.
  5. CachingReaderWorker finishes decoding audio, terminates.
  6. Audio callback terminates. Wakes EngineWorkerScheduler
  7. EngineWorkerScheduler runs CachingReaderWorker

But now, an audio callback cannot schedule CachingReaderWorker until CachingReaderWorker is complete. This means that we will go a full latency period before scheduling CachingReaderWorker again. The end result is that on slow hardware, CachingReaderWorker scheduling may go from being scheduled once per callback to once every 2 callbacks. This could cause CachingReaderWorker to do a lot of work in bursts.

I wonder if this chain of events happens in reality. I'll try some tests with my netbook.

@rryan
Copy link
Member

rryan commented Jun 26, 2013

I've been on a quest to get rid of mutex locks in the engine for a year or so now. On some platforms they require a syscall regardless of whether they would block (and syscalls should be avoided in the audio callback). That's why I don't like the new mutex in CachingReaderWorker.

@daschuer
Copy link
Member Author

But I really don't like adding a mutex lock per deck/sampler/previewdeck per
callback to the engine. The QAtomicInt method was working fine before. The
fix for the race condition you saw is trivial -- just stop EngineWorkerScheduler
first in the EngineMaster destructor. By definition that stops the thread pool
and so no CachingReaderWorker can be running when the EngineChannels are
destroyed.

I still did not get your point. Does this kill the thread? It seems you have the
solution ready for coding. So can you probably fix it on top of these?

This change has a potential impact on reader scheduling for slow machines.

I already have dropouts with the current solution on my Netbook. I am not sure
what exactly is going on but the PA underflow counter remains at 0.

This change makes it so that CachingReaderWorker cannot schedule itself until>
it is done working.

First I have thought there is no problem but I miss one thing:

  • CachingReaderWorker loops m_chunkReadRequestFIFO, until it is empty.
  • It is filled by the Enfine callback. On every cycle with new hints.
  • If CachingReaderWorker is still running, it will not end.
  • But if m_chunkReadRequestFIFO id filled after while but before
    m_mutexRun.unlock() it will not scheduled again.

So we are talking about the tiny moment between leaving the while loop and
unlocking the m_mutexRun.unlock(); right?

By the way: the current solution suffer the same problem:

while (m_scheduleFIFO.read(&pWorker, 1) == 1) {
    if (pWorker && !pWorker->isActive()) {
        pWorker->setActive(true);
    }
} 

The schedule request is lost if this check occurs at the same tiny monent as above.

So we must find a solution that CachingReaderWorker can detect it and schedule
itself again.

Maybe we can solve all other issues by this Way.

I do not like the Atomic flags solution, because there is currently a own thread
scheduled each audio cycle only for checking these flags. A mutex has the benefit,
that a thread can sleep at it. An I am sure there are many mutexes under the hood
in the thread pool implementation.

Did you have a clue what is the timing difference between checking a atomic flag
and m_MutexRun.trylock() and m_pWorkerScheduler->runWorkers();

What is the benefit of limiting the count of EngineWorkerThread to 4? I know this
is a good method to balancing the CPU core usage and the thread-switch-overhead for
CPU intensive tasks. But in our case, a CachingReaderWorker has both: CPU intensive
encoding and waiting for disc IO. Since storage devices can read only on byte at a
time, I am afraid that if we schedule 4 CachingReaderWorkers the will wait a lot of
time at the disk_io_mutex.

So what about having one thread per EngineBuffer and count the requested chunks via
semaphore. So we have an simple model: The CachingReaderWorker sleeps at the semaphore
until there is a job to do. Later we can move the CachingReaderWorker to a "TrackBuffer"
Object. So we have one thread per loaded Track.

There might be also a problem with outdated Hints. The requested Hints a freshly
calculated every audio cycle. If CachingReaderWorkers is still running, it reads still
Chunks from the previous audio Cycle, the most wanted new Hints are processed later
but in the current CachingReaderWorkers run. Is it right?
So in case of dropout the CachingReaderWorkers has to catch up, instead of throwing
outdated hints away.

@rryan
Copy link
Member

rryan commented Jun 27, 2013

Hey Daniel -- I should explain why the EngineWorkerScheduler exists.

The point of it is to provide a mechanism for scheduling work after the callback is complete -- after is the key. Before this existed, CachingReader was its own thread and it communicated with EngineBuffer via semaphore (as you propose). This scheme regularly caused xruns during engine mixing because we have multiple EngineBuffers but the moment one EngineBuffer was finished, it woke its CachingReader because it had work to do. The CachingReader would then process while the other EngineBuffers were processing. As each EB woke its reader, we had more and more CachingReaders schedule-able by the OS while the callback was still running.

This was quite bad for underpowered hardware. That is why I created EngineWorkerScheduler back in 2010 -- by queueing up requests via teh EngineWorkerScheduler FIFO and then only waking the EWS at the end of EngineMaster processing. Beta testers of 1.8.0 betas reported huge gains in performance on low-powered hardware when I switched from scheduling CachingReader work to occur after the callback versus during the callback.

So, even with one thread per EngineBuffer they still should wait until after EngineMaster is finished.

@rryan
Copy link
Member

rryan commented Jun 27, 2013

An I am sure there are many mutexes under the hood in the thread pool implementation.

This doesn't matter because the engine never touches the thread pool -- that's all done in the EngineWorkerScheduler thread. The Engine talks with the EWS thread via FIFO.

(Or maybe by "thread pool implementation" you meant EngineWorkerScheduler? I interpreted it as you talking about QThreadPool which is never touched by the callback thread)

@rryan
Copy link
Member

rryan commented Jun 27, 2013

I still did not get your point. Does this kill the thread? It seems you have the solution ready for coding. So can you probably fix it on top of these?

Yea, I like the splitting of CachingReader and CachingReaderWorker. I don't like other aspects which are what I'm discussing with you.

@rryan
Copy link
Member

rryan commented Jun 27, 2013

I already have dropouts with the current solution on my Netbook. I am not sure what exactly is going on but the PA underflow counter remains at 0.

Is this due to caching reader cache misses? Can you add a util/counter.h counter for caching reader cache misses in CachingReader::read() to check?

@rryan
Copy link
Member

rryan commented Jun 27, 2013

Did you have a clue what is the timing difference between checking a atomic flag and m_MutexRun.trylock() and m_pWorkerScheduler->runWorkers();

It's not comparing apples to apples to compare runWorkers() and tryLock() because your tryLock()-based implementation depends on runWorkers() to work (and so did the old QAtomicInt implementation).

  • runWorkers() signals a wait condition once per callback... on UNIX (using pthreads) and Windows this involves locking/unlocking a mutex.
  • m_mutexRun.tryLock() tryLocks()s a mutex N times per callback where N = number of tracks loaded to decks, preview decks, samplers.

tryLock() is implemented as testAndSet()'ing a QAtomicInt so it's identical to the QAtomicInt it replaces. But QMutex::unlock() is insidious because it has to wake up threads that are blocked on the mutex. To do that, it locks an internal mutex -- so QMutex::unlock() can block on UNIX! It's unclear from the Windows implementation if it can block since it makes Windows API calls.

That's just the problem -- QMutex is a black box. To understand whether your code will block you have to read the source -- and there is a different implementation per-platform. And Qt could change the implementation. As a general rule they have no business being in the engine (and that includes Qt signals and slots!).

@rryan
Copy link
Member

rryan commented Jun 27, 2013

What is the benefit of limiting the count of EngineWorkerThread to 4?

It's totally arbitrary and really should be increased now that we can have 16 sample decks, 2 decks, and a preview deck! Good point -- we should raise it.

@rryan
Copy link
Member

rryan commented Jun 27, 2013

Since storage devices can read only on byte at a time, I am afraid that if we schedule 4 CachingReaderWorkers the will wait a lot of time at the disk_io_mutex.

This is a very simplified view of IO. IO schedulers in modern operating systems do tons of quite fancy things automatically -- optimistic pre-fetching, aggressive caching, and almost every filesystem reads blocks from a disk, not bytes. Not every read() from the filesystem will result in blocking on a spinning piece of metal.

@rryan
Copy link
Member

rryan commented Jun 27, 2013

I do not like the Atomic flags solution, because there is currently a own thread scheduled each audio cycle only for checking these flags.

But this pull request doesn't change the fact that the EngineWorkerScheduler is scheduled after each callback to schedule the EngineWorkers -- and that's what you don't like.

Edited to add: The QAtomicInt flags are identical to what you are doing with the QMutex since under the hood tryLock() / unlock() is the same as a QAtomicInt with the added disadvantage that QMutex::unlock() can block (though I realize you don't unlock() from the callback unless there is something wrong like the EWS is not created)

@rryan
Copy link
Member

rryan commented Jun 27, 2013

A mutex has the benefit, that a thread can sleep at it.

But your patch doesn't use this benefit. If you were to switch to using a QWaitCondition to wake a sleeping mutex on a per-EngineBuffer basis then that's even worse than the tryLocks -- you're locking (actually locking, not tryLock'ing) a mutex N times per callback instead of one time per callback (with EWS).

@rryan
Copy link
Member

rryan commented Jun 27, 2013

Later we can move the CachingReaderWorker to a "TrackBuffer" Object. So we have one thread per loaded Track.

Having an EngineWorker per track loaded is a great idea and definitely where we should be going with our reader infrastructure. As we add more and more features that cause users to duplicate tracks or load the same track into multiple decks we will be wasting CPU and RAM double-decoding and double-caching the file.

@daschuer
Copy link
Member Author

Ah, thank you for your explanations. Now the thing is quite clear. Will you adopt this branch?

@daschuer
Copy link
Member Author

I have tested around and found out that even if the small cache misses, it is barley to hear. So my dropouts must be an other issue. Also a lost wake does not lower the sound quality.

But there is a serious problem when a wake is lost after changing the track. In this case "loading track is displayed but nothing happens. That can be easily solved by loading this track again. But it will brake auto-DJ because it is not able to Do so,

@daschuer
Copy link
Member Author

The last changes should save some time in the engine thread :-)

@daschuer daschuer merged commit fd0a392 into mixxxdj:master Jul 3, 2013
@daschuer daschuer mentioned this pull request Dec 28, 2014
@daschuer daschuer mentioned this pull request Aug 13, 2017
19 tasks
uklotzde added a commit that referenced this pull request Dec 10, 2017
Even if it is only a test the rules for maintainable code still apply.
@daschuer daschuer mentioned this pull request Jun 11, 2018
11 tasks
ronso0 referenced this pull request in ronso0/mixxx Oct 11, 2019
Limited Intro Outro Start Transition mode.
m0dB pushed a commit to m0dB/mixxx that referenced this pull request Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants