-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix debug assert in CachingReader #2309
Conversation
How about emitting and forwarding signals through direct connections between different threads??? |
FWIW, I just played a 3.5 hour set with this and the track not having sound output bug never occurred. |
So you prefer a hacky solution for 2.2 relying on some undefined behavior? Btw, the Xenial build is broken. |
This is an unrelated detail here. I agree that these callbacks, called form a different thread than other function in a class can be supprising and should be avoided if posdible. We need to make sure that it is at least well documented but for now I did not want to touch it. |
The Xenial error is interesting, because I have successfully build it here on Xenial. A solution would be to replace the m_status with a qAtomicInt
|
I will not make any compromises regarding quality and won't take responsibility for this hack. |
I think we have no other chance to do so here IMHO, without starting a major refactoring of the engine.
At least the behaviour id well defined. The direct connection was never part of the issue we have experienced and hopefully solve here. And IMHO callbscks from different threads to change atomic member variables is a normal pattern in a lock free solutions and have sometimes less hassle than a cued connection from the main thread like you propose. |
The queued connection goes from a worker thread (without an event queue) to the main thread and not vice versa. What's even worse is that we forward directly connected signals from |
Not into the unknown, into the engine. The signal forwarding, in this case a simple callback with some Qt suggar, is part of the legacy where caching reader and worker where one class. I think there is some room for improvments, but the point here is that the design was not recently introduced and is not part of the issue we solve here. As explained, the original issue was, that the engine stops calling process after eject. The second issue you have discovered with the consequtive track loads is sill pending, though, but a new, hopeful no real live issue. But worth to solve since we are aware now. |
I would prefer a hacky solution to going down a rabbit hole of refactoring on a stable branch. |
I now agree that invoking freeAllChunks() in newTrack() was wrong. We should annotate all functions with the thread they are supposed to be called and document the restrictions carefully. This is class is a dangerous beast. |
We should replace the signals with regular function calls to avoid further confusion. Then separate the functions into groups associated with threads. |
src/engine/cachingreader.cpp
Outdated
auto pChunk = update.takeFromWorker(); | ||
if (pChunk) { | ||
DEBUG_ASSERT(m_state != State::Idle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to move the debug assertion. The condition must be valid while looping, independent of the update message. Ignore my comment. But I still don't understand when and why this might happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the original position it happen when you have ejected the track.
In this case the "idle" state has overwritten the "loading" state because a null track was loaded. After loading a new non null track we receive the "loaded" signal while in idle state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should receive TRACK_LOADED/UNLOADED messages only while the state is Loading. The Idle state is set after we have received TRACK_UNLOADED. We should not receive any messages from the worker until the state has been changed from Idle to Loading again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the cue is not processed after ejecting the track. That is why the idle message is still in the queue and overwrite the loading message.
This is not much of an issue, but I see how it is surprising.
I will look how to solve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not an issue to overwrite the loading message because the very next message is the loaded message and the cue was purged until the loading state was there. Purging is stopped by the idle message.
Can we solve this by renaming?
hmm odd. std::atomic m_state; works on my xenial system but not on Travis. |
2dcaef6
to
ff61eeb
Compare
<< "Cannot load new track while loading a track"; | ||
return; | ||
} | ||
m_worker.newTrack(std::move(pTrack)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final line Ok, I see you moved it.m_worker.workReady();
got lost along the way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from your template :-)
…e a new track has been loaded.
I'm convinced that the code now works as expected! My previous approach didn't consider all the intended side effects and was wrong. Thanks for stopping me ;) The design is still a mess, but I'm not going to rewrite it. At least not in C++. This code could be written much easier, safer, and cleaner in Rust. Maybe an option for the future. |
src/engine/cachingreader.cpp
Outdated
freeAllChunks(); | ||
auto newState = pTrack ? STATE_TRACK_LOADING : STATE_TRACK_UNLOADING; | ||
auto oldState = m_state.fetchAndStoreAcquire(newState); | ||
VERIFY_OR_DEBUG_ASSERT( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this can currently actually happen in case the worker is stalled for some reasons.
We need to prevent it in the engine that this case is really impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, from the caching reader perspective this is not longer an issue, since CachingReaderWorker::loadTrack handles the loading and is not interrupted. It is an engine issue, however we cannot judge from m_state, because this is delayed due to the fifos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, the issue is the slotLoadingTrack(TrackPointer pNewTrack, TrackPointer pOldTrack) everywhere.
New Track is only a wild guess if we allow another new track loading. In most cases pNewTrack is not used but not in every case. We need to clean up this which is risky due to the string based signal slot connections.
So lets stick with this assertion, until we manage to make it happen, but not as critical.
I will add a comment.
OK, we must not assert possible things, so I replaced it with a warning(). It "only" lead to inconsistency GUI, no reason for quit Mixxx and stop the party. I have manually tested it a bit with a sleep in loadTrack() and nothing "critical" happens. We should target this in 2.3 if it turns out to be a real live issue. I am ready here. |
Idle, | ||
TrackLoading, | ||
TrackLoaded, | ||
enum State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this changed from an enum class to an enum? Please change it back to an enum class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need integers for the atomic. I didn't check if this still works with an enum class
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cast it to an int and back when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had a version with a static_cast() one every use. But @uklotzde version with a plain enum conviced me, because it is less distracting. In this case we have no namespace pollution issue, because the enum is class private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this special case the enum literals are just private integer constants and a classical enum is a good fit.
src/engine/cachingreader.cpp
Outdated
// but the newTrack may change if we load a new track while the previous one | ||
// is still loading. This leads to inconsistent states for example a different | ||
// track in the Mixx Title and the Deck label. | ||
if (oldState != STATE_TRACK_LOADING || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition needs to be inverted: == && ==
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups .. yes!
src/engine/cachingreader.cpp
Outdated
@@ -218,6 +223,7 @@ void CachingReader::process() { | |||
ReaderStatusUpdate update; | |||
while (m_readerStatusUpdateFIFO.read(&update, 1) == 1) { | |||
auto pChunk = update.takeFromWorker(); | |||
qDebug() << "CachingReader::process()" << update.status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This debug log will be printed too frequently, not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups a leftover.
src/engine/cachingreaderworker.cpp
Outdated
@@ -143,6 +143,9 @@ void CachingReaderWorker::loadTrack(const TrackPointer& pTrack) { | |||
// Emit that a new track is loading, stops the current track | |||
emit trackLoading(); | |||
|
|||
// Slows down track loading for debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either be more specific "for debugging race conditions while loading multiple tracks in a row" or delete the comment.
Done. |
I'm confident that we ironed out all the undefined behavior while avoiding trapdoors. LGTM. |
Great, thank you both for your hard work on this critical issue. I think we should wait at least a week or two before releasing 2.2.3 to see if anyone reports bugs from these changes. I just merged the 2.2 branch to master. |
I have finally managed to dig to the ground of the assertion and created a solution with a lower impact on the way. Sorry for the concurrent PR, but IMHO this is a best way the get a clear view on the issue for both of us. The consecutive track load fix of your PR is not included here.
The issue was that the engine stops calling process when the deck is ejected. This is why the m_stateFIFO is not purged as originally assumed after unloading the old track but after loading the new track. This does not matter in the track change case, but is an issue when ejecting the track first.
in this case the idle signal is delayed and overwrites the loading state leading to the assert violation.
The fix now uses TRACK_LOADED and TRACK_UNLOADED as a border in the m_stateFIFO fifo to distinguish old chunks from new.
I have also moved the freeAllChunks to the engine thread and call it whenever such a border appears.