Fix debug assert in CachingReader #2309

daschuer · 2019-10-01T23:59:48Z

I have finally managed to dig to the ground of the assertion and created a solution with a lower impact on the way. Sorry for the concurrent PR, but IMHO this is a best way the get a clear view on the issue for both of us. The consecutive track load fix of your PR is not included here.

The issue was that the engine stops calling process when the deck is ejected. This is why the m_stateFIFO is not purged as originally assumed after unloading the old track but after loading the new track. This does not matter in the track change case, but is an issue when ejecting the track first.
in this case the idle signal is delayed and overwrites the loading state leading to the assert violation.

The fix now uses TRACK_LOADED and TRACK_UNLOADED as a border in the m_stateFIFO fifo to distinguish old chunks from new.

I have also moved the freeAllChunks to the engine thread and call it whenever such a border appears.

…for state

uklotzde · 2019-10-02T06:25:08Z

How about emitting and forwarding signals through direct connections between different threads???

Be-ing · 2019-10-02T07:21:04Z

FWIW, I just played a 3.5 hour set with this and the track not having sound output bug never occurred.

uklotzde · 2019-10-02T07:23:28Z

So you prefer a hacky solution for 2.2 relying on some undefined behavior?

Btw, the Xenial build is broken.

daschuer · 2019-10-02T08:09:55Z

How about emitting and forwarding signals through direct connections between different threads???

This is an unrelated detail here.

I agree that these callbacks, called form a different thread than other function in a class can be supprising and should be avoided if posdible. We need to make sure that it is at least well documented but for now I did not want to touch it.

daschuer · 2019-10-02T08:15:19Z

The Xenial error is interesting, because I have successfully build it here on Xenial.

A solution would be to replace the m_status with a qAtomicInt

src/engine/cachingreader.cpp:4:
/usr/include/c++/4.8/atomic: In member function ‘virtual void CachingReader::newTrack(TrackPointer)’:
/usr/include/c++/4.8/atomic:199:9: error: invalid memory model for ‘__atomic_store’
       { __atomic_store(&_M_i, &__i, _m); }

uklotzde · 2019-10-02T08:29:06Z

I will not make any compromises regarding quality and won't take responsibility for this hack.

daschuer · 2019-10-02T08:42:02Z

I think we have no other chance to do so here IMHO, without starting a major refactoring of the engine.

So you prefer a hacky solution for 2.2 relying on some undefined behavior?

At least the behaviour id well defined.

The direct connection was never part of the issue we have experienced and hopefully solve here. And IMHO callbscks from different threads to change atomic member variables is a normal pattern in a lock free solutions and have sometimes less hassle than a cued connection from the main thread like you propose.

uklotzde · 2019-10-02T09:11:03Z

The queued connection goes from a worker thread (without an event queue) to the main thread and not vice versa.

What's even worse is that we forward directly connected signals from CachingReaderWorker (running an engine worker thread) through CachingReader into the unknown.

daschuer · 2019-10-02T09:47:46Z

Not into the unknown, into the engine.

The signal forwarding, in this case a simple callback with some Qt suggar, is part of the legacy where caching reader and worker where one class.

I think there is some room for improvments, but the point here is that the design was not recently introduced and is not part of the issue we solve here.

As explained, the original issue was, that the engine stops calling process after eject.

The second issue you have discovered with the consequtive track loads is sill pending, though, but a new, hopeful no real live issue. But worth to solve since we are aware now.

Be-ing · 2019-10-02T11:58:42Z

I would prefer a hacky solution to going down a rabbit hole of refactoring on a stable branch.

uklotzde · 2019-10-02T12:06:49Z

I now agree that invoking freeAllChunks() in newTrack() was wrong.

We should annotate all functions with the thread they are supposed to be called and document the restrictions carefully. This is class is a dangerous beast.

uklotzde · 2019-10-02T12:08:36Z

We should replace the signals with regular function calls to avoid further confusion. Then separate the functions into groups associated with threads.

uklotzde · 2019-10-02T12:12:47Z

src/engine/cachingreader.cpp

        auto pChunk = update.takeFromWorker();
        if (pChunk) {
+            DEBUG_ASSERT(m_state != State::Idle);


~~You don't need to move the debug assertion. The condition must be valid while looping, independent of the update message.~~ Ignore my comment. But I still don't understand when and why this might happen.

At the original position it happen when you have ejected the track.
In this case the "idle" state has overwritten the "loading" state because a null track was loaded. After loading a new non null track we receive the "loaded" signal while in idle state.

We should receive TRACK_LOADED/UNLOADED messages only while the state is Loading. The Idle state is set after we have received TRACK_UNLOADED. We should not receive any messages from the worker until the state has been changed from Idle to Loading again.

The issue is that the cue is not processed after ejecting the track. That is why the idle message is still in the queue and overwrite the loading message.
This is not much of an issue, but I see how it is surprising.
I will look how to solve this.

It is not an issue to overwrite the loading message because the very next message is the loaded message and the cue was purged until the loading state was there. Purging is stopped by the idle message.

Can we solve this by renaming?

daschuer · 2019-10-02T16:36:58Z

hmm odd. std::atomic m_state; works on my xenial system but not on Travis.
QAtomicInteger m_state; dos not work because std::is_integral fails for enums.
even if i put ": int" to the definition.

uklotzde · 2019-10-03T09:38:21Z

src/engine/cachingreader.cpp

+                << "Cannot load new track while loading a track";
+        return;
+    }
+    m_worker.newTrack(std::move(pTrack));


~~The final line m_worker.workReady(); got lost along the way~~ Ok, I see you moved it.

from your template :-)

…e a new track has been loaded.

uklotzde · 2019-10-03T09:58:30Z

I'm convinced that the code now works as expected! My previous approach didn't consider all the intended side effects and was wrong. Thanks for stopping me ;)

The design is still a mess, but I'm not going to rewrite it. At least not in C++. This code could be written much easier, safer, and cleaner in Rust. Maybe an option for the future.

daschuer · 2019-10-03T10:09:46Z

src/engine/cachingreader.cpp

-    freeAllChunks();
+    auto newState = pTrack ? STATE_TRACK_LOADING : STATE_TRACK_UNLOADING;
+    auto oldState = m_state.fetchAndStoreAcquire(newState);
+    VERIFY_OR_DEBUG_ASSERT(


I guess this can currently actually happen in case the worker is stalled for some reasons.
We need to prevent it in the engine that this case is really impossible.

Wait, from the caching reader perspective this is not longer an issue, since CachingReaderWorker::loadTrack handles the loading and is not interrupted. It is an engine issue, however we cannot judge from m_state, because this is delayed due to the fifos.

Ups, the issue is the slotLoadingTrack(TrackPointer pNewTrack, TrackPointer pOldTrack) everywhere.
New Track is only a wild guess if we allow another new track loading. In most cases pNewTrack is not used but not in every case. We need to clean up this which is risky due to the string based signal slot connections.

So lets stick with this assertion, until we manage to make it happen, but not as critical.
I will add a comment.

daschuer · 2019-10-03T12:11:25Z

OK, we must not assert possible things, so I replaced it with a warning().
For the caching reader it is no problem, because no chunks are cached between two loads.
I have improved the assertion for that.

It "only" lead to inconsistency GUI, no reason for quit Mixxx and stop the party.

I have manually tested it a bit with a sleep in loadTrack() and nothing "critical" happens.

We should target this in 2.3 if it turns out to be a real live issue.

I am ready here.

Be-ing · 2019-10-03T14:26:14Z

src/engine/cachingreader.h

-        Idle,
-        TrackLoading,
-        TrackLoaded,
+    enum State {


Why was this changed from an enum class to an enum? Please change it back to an enum class.

Because we need integers for the atomic. I didn't check if this still works with an enum class.

So cast it to an int and back when needed.

I actually had a version with a static_cast() one every use. But @uklotzde version with a plain enum conviced me, because it is less distracting. In this case we have no namespace pollution issue, because the enum is class private.

In this special case the enum literals are just private integer constants and a classical enum is a good fit.

uklotzde · 2019-10-03T18:05:02Z

src/engine/cachingreader.cpp

+    // but the newTrack may change if we load a new track while the previous one
+    // is still loading. This leads to inconsistent states for example a different
+    // track in the Mixx Title and the Deck label.
+    if (oldState != STATE_TRACK_LOADING ||


Condition needs to be inverted: == && ==

Ups .. yes!

uklotzde · 2019-10-03T18:05:39Z

src/engine/cachingreader.cpp

@@ -218,6 +223,7 @@ void CachingReader::process() {
    ReaderStatusUpdate update;
    while (m_readerStatusUpdateFIFO.read(&update, 1) == 1) {
        auto pChunk = update.takeFromWorker();
+        qDebug() << "CachingReader::process()" << update.status;


This debug log will be printed too frequently, not a good idea.

Ups a leftover.

uklotzde · 2019-10-03T18:06:37Z

src/engine/cachingreaderworker.cpp

@@ -143,6 +143,9 @@ void CachingReaderWorker::loadTrack(const TrackPointer& pTrack) {
    // Emit that a new track is loading, stops the current track
    emit trackLoading();

+    // Slows down track loading for debug


Either be more specific "for debugging race conditions while loading multiple tracks in a row" or delete the comment.

daschuer · 2019-10-03T19:38:24Z

Done.

uklotzde · 2019-10-03T22:49:07Z

I'm confident that we ironed out all the undefined behavior while avoiding trapdoors. LGTM.

Be-ing · 2019-10-03T22:52:05Z

Great, thank you both for your hard work on this critical issue. I think we should wait at least a week or two before releasing 2.2.3 to see if anyone reports bugs from these changes.

I just merged the 2.2 branch to master.

move freeAllChunks() to the engine thread and use a std::atomic enum …

ca6c414

…for state

uklotzde reviewed Oct 2, 2019

View reviewed changes

uklotzde added 2 commits October 2, 2019 18:22

Revert renaming of member

bbf2d17

Rename factory function

5cbab1e

uklotzde added 3 commits October 2, 2019 18:43

Add lost line

0e7cbf9

Use a QAtomicInt for storing the CachingReader state

f2802da

Remove obsolete virtual functions

ff61eeb

daschuer force-pushed the 2.2_cachingreader branch 2 times, most recently from 2dcaef6 to ff61eeb Compare October 3, 2019 09:22

move workReady call to CachingReaderWorker::newTrack()

d02ae6f

uklotzde reviewed Oct 3, 2019

View reviewed changes

Reorder calls for a better documentation. Don't clean up chunks befor…

26dd7ce

…e a new track has been loaded.

daschuer commented Oct 3, 2019

View reviewed changes

Fix critical() in case of two consecutive track loads

b893fe6

Be-ing reviewed Oct 3, 2019

View reviewed changes

uklotzde added this to the 2.2.3 milestone Oct 3, 2019

uklotzde reviewed Oct 3, 2019

View reviewed changes

daschuer added 2 commits October 3, 2019 21:33

invert warning condition and remove debug leftover.

c1bab77

Remove commented debugging code

d7737df

uklotzde merged commit b9de27a into mixxxdj:2.2 Oct 3, 2019

daschuer deleted the 2.2_cachingreader branch September 26, 2021 17:37

Fix debug assert in CachingReader #2309

Fix debug assert in CachingReader #2309

Conversation

daschuer commented Oct 1, 2019

uklotzde commented Oct 2, 2019

Be-ing commented Oct 2, 2019

uklotzde commented Oct 2, 2019

daschuer commented Oct 2, 2019

daschuer commented Oct 2, 2019

uklotzde commented Oct 2, 2019

daschuer commented Oct 2, 2019

uklotzde commented Oct 2, 2019

daschuer commented Oct 2, 2019

Be-ing commented Oct 2, 2019

uklotzde commented Oct 2, 2019

uklotzde commented Oct 2, 2019

uklotzde Oct 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daschuer commented Oct 2, 2019

uklotzde Oct 3, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uklotzde commented Oct 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daschuer commented Oct 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daschuer commented Oct 3, 2019

uklotzde commented Oct 3, 2019

Be-ing commented Oct 3, 2019

uklotzde Oct 2, 2019 •

edited

Loading

uklotzde Oct 3, 2019 •

edited

Loading