New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Responsive and thread-safe track analysis (ad-hoc + batch) #1413

Closed

uklotzde wants to merge 146 commits into mixxxdj:master from uklotzde:lp1737537

Contributor

uklotzde commented Dec 13, 2017 •

edited

Loading

~~...that might prematurely abort a batch analysis. The blocking operation must be enclosed in a loop.~~

Related bug reports:

https://bugs.launchpad.net/mixxx/+bug/1738227

Existing implementation:

Summary: Hardly usable and unsafe, inappropriate for a release
Awkward API design causes unresponsiveness of the UI when starting and during an analysis
Unneeded and inefficient prioritization of scheduled tracks
Progress update gets confused and displays invalid or inconsistent values when adding tracks during an analysis (you might already have seen negative values)
Race conditions between scheduler (= queue) and worker thread

New implementation:

Re-designed scheduler API (formerly known as "AnalyzerQueue") for efficiently scheduling multiple tracks for analysis
Lock-free worker thread design that does not block the scheduler/UI, neither when scheduling tracks nor during progress updates
No more inconsistencies during progress update caused by inappropriate parameterization
Thread-safe creation of isolation of fidlib filters
Thread-safe isolation of Vamp plugins after experiencing spurious crashes when running multiple worker threads concurrently
Enabling of multi-threaded analysis is completely transparent and does not require any changes in the remaining code base

The PR got too big, because after detecting some minor issues we discovered new and even worse ones. Thank you all for your valuable input 👍 In the end a complete re-design was needed to get the whole analysis code into a stable, safe, and usable state. If we would have known in advance we could have started fixing single issues in isolation. Unfortunately I don't have the time to decompose this PR again into smaller parts that could be reviewed one by one. That would cost me at least another day or even more, no way.

Contributor

Be-ing commented Dec 14, 2017

This has not fixed my issue with batch analysis :/ I still see the same stuff in my log.

Contributor

Be-ing commented Dec 14, 2017

I tried renaming my mixxxdb.sqlite file and that didn't fix it either.

uklotzde changed the title ~~Fix potential race condition in AnalyzerQueue...~~ Fix potential race condition in AnalyzerQueue

Contributor

JosepMaJAZ commented Dec 14, 2017 •

edited

Loading

And... What about my Multithreaded branch? In there, the isLoadedTrackWaiting method is not present. I removed it because I couldn't really understand what it tried to do. It seems as if it tries to remove a track from the list, if it is going to analyze it.
It's not really something that needs special treatment, except, at much, to avoid analysing more than once on two threads at the same time. (although I'm not 100% sure either if it checks that scenario)

Contributor Author

uklotzde commented Dec 14, 2017

The AnalyzerQueue still seems to have serious issues. The more tracks are added to the queue the more sluggish the interface becomes. Not yet finished analysing why this happened. Maybe an unexpected side-effect with the recently added TrackCache.

Here I just fixed some minor issues that should not cause any more regressions. Reducing lock contention on the TrackCache is work in progress and a different aspect.

Contributor Author

uklotzde commented Dec 15, 2017

@JosepMaJAZ I plan to review your multi-threaded analysis work! But please don't expect that we are able to finish it very soon. If we don't grasp and tame the dynamics of a single concurrent task within the boundaries of the current architecture, I doubt that we are able to get it right for multiple threads.

First we need a rock solid solution for a single concurrent thread that never fails, neither in real life nor in theory. Then we may generalize that for multiple threads.

Member

daschuer commented Dec 15, 2017

It does not hurt to is atomics here.
But plain ints are atomic as well. So the latest commits should not make a difference.

I have not get the point where was the race condition in the first commit. Could you explain it a bit more?

For me this PR is up to now (only) a nice refactoring. Can we merge it as it is? To keep a possible final solution PR readable?

Member

daschuer commented Dec 15, 2017

Suspicion:
We have a pending race condition with a global qt mutex protecting the gui thread signal queue.
If I recall it correct, the engine locks it whenever writing an co. This low priority analyser also uses qt signals to the GUI thread. So maybe we see a priority inversion caused by this mutex.
An idea to solve it is to pass all all engine signals (co valueChanged) to a second thead in a non locking way. This thread can now singnal the GUI thread and hold the mutex without blocking the engine.

This is of cause no solution for priority inversion with the analysis and other thread signaling the GUI, but at least it will prevents sound dropouts.

Can this be you issue?

Contributor Author

uklotzde commented Dec 15, 2017

👍 Even after reducing the lock contention on TrackCache (will be published soon) the analysis is still very "sluggish". I had no idea why, but this could be the reason!!

Contributor Author

uklotzde commented Dec 15, 2017 •

edited

Loading

Race condition: The queue might already be empty again after the thread has actually be unblocked. You always should to enclose those lock/wait/check patterns in a loop!

Rethinking this special case: The worker thread is currently the only consumer that is able to take items from the queue. But even then we should apply the common pattern now. In the future the same queue might be consumed by multiple threads. Better to get this correct right from the start!

Member

daschuer commented Dec 15, 2017

This explains a on time drop out , but it does not explain that the GUI starts to be sluggish, is there anything new, continuously sending signals?

Member

daschuer commented Dec 15, 2017

Did not read you latest comment before mine.
Thank you for explaining. Does this mean we can merge this now. LGTM by the way.

Contributor Author

uklotzde commented Dec 15, 2017

Those atomic variables are read and written from different threads and should use the atomic types with memory fences/barriers to avoid race conditions. It is at least good style to prevent any undefined behaviour and the atomic type indicates the sharing between threads.

Contributor Author

uklotzde commented Dec 15, 2017 •

edited

Loading

The analyzer thread continously sends progress updates, at least every 60 ms. But this didn't change, so I wonder why this now seems to be an issue.

Be-ing mentioned this pull request

TrackCache: Fixes and optimizations (lp1738253/17382271738028) #1415

Merged

Contributor

JosepMaJAZ commented Dec 15, 2017

I still don't see what you all try to address with this PR.

If the problem is why it blocks the UI when starting a new analysis with hundreds (if not thousands) of tracks, then that has two reasons:
First one, we still load the tracks, instead of only the Id's.
Second one, we use a simple queue in which we test if a track has previously been added. The more tracks there are, the longer it takes to add one more track.

If you are trying to reduce glitches while the analysis is on (I wonder which ones), then what you need to redo is how the UI updates are being done. I didn't modify that part on my multithreaded PR, but I described that the mechanism in place is a bit stupid, because what it blocks is the emit messages from the analyzer thread to the other thread, (supposedly main thread), which in turn emits another message for the UI update.

The UI thread (maybe tied to the refresh rate), should be telling if it has updated the ui or not, and just after that has been done, allow another update. But if the 60ms wait is in place, there isn't any reason to worry about UI updates (that's around 15 HZ).

So, really... what is being fixed here?

uklotzde changed the title ~~Fix potential race condition in AnalyzerQueue~~ Fix ~~potential~~ race condition in AnalyzerQueue

uklotzde changed the title ~~Fix ~~potential~~ race condition in AnalyzerQueue~~ Fix race condition in AnalyzerQueue

Contributor Author

uklotzde commented Dec 16, 2017

@Be-ing Does the last commit fix your issue with tracks not being analyzed? Enqueuing was broken since ???, this is no regression.

Contributor Author

uklotzde commented Dec 16, 2017

@JosepMaJAZ I added a TODO with a hint for the O(n²) complexity when adding multiple tracks in a row. But this is currently not the main issue.

There is already a semaphore in place that tells the analyzer if the UI is ready for the next update. I don't know if this is appropriate or not. One step after the other.

Contributor

Be-ing commented Dec 16, 2017

@Be-ing Does the last commit fix your issue with tracks not being analyzed? Enqueuing was broken since ???, this is no regression.

No :/

uklotzde added the major bug label

uklotzde added this to the 2.1.0 milestone

Contributor

JosepMaJAZ commented Dec 16, 2017

Be-ing: can you add additional information as to what this PR tries to fix?
I built the master on both, Windows and Linux, used different skins and analyzed a bunch of tracks, and I haven't noticed anything strange (Linux was run under VirtualBox, so it isn't exactly a good test environment, but I didn't see anything that I wasn't working as expected.)

Contributor

Be-ing commented Dec 16, 2017

@JosepMaJAZ: I renamed my mixxxdb.sqlite because the schema got messed up somewhere along the way testing various PRs. Upon doing batch analysis for my whole library, I found a serious bug that @uklotzde is trying to fix here.

Contributor Author

uklotzde commented Dec 16, 2017

The "sluggish" response is actually caused by the way how the worker thread communicates with the analyzer component in the main UI thread. I need to decide how to decouple them to avoid blocking on the semaphore. Shared state between threads with different priorities is a bad idea. The semaphore always seemed odd to me and I should have noticed earlier!

This might take some time also for testing, so I'm not sure if it still fits into this PR that contains some bug fixes and refactorings to make the code comprehensible and maintainable.

Contributor

Be-ing commented Dec 16, 2017

Hanging the GUI is a problem, but not as big of a problem as skipping analysis of all tracks. Let's prioritize that.

uklotzde changed the title ~~Fix race condition in AnalyzerQueue~~ [WiP] Fix race condition in AnalyzerQueue

Contributor Author

uklotzde commented Dec 17, 2017 •

edited

Loading

~~We also need to consider the fact that tracks are still saved (metadata export + database update) from the low-prio analysis thread that currently owns the last track pointer reference.~~ All clear, save actions are executed within the [Main] thread. Nevertheless this needs to be analyzed and documented thoroughly.

Contributor Author

uklotzde commented Dec 17, 2017

Bulk updates for many tracks are still an issue in the main thread, but the lock-free implementation avoids to block the main thread from a low-prio background thread. I also reduced the amount of inter-thread communication.

Updates during analysis don't need to be consumed by the main thread, since each subsequent update replaces and includes its predecessor. The only crucial point here is that the last update from the analysis thread should finally reach the main thread. I need to check that this requirement is still met.

Note: During testing with massive loads I experienced one or two crashes within the Qt event loop, that could not be backtraced into Mixxx code. Those crashes could be caused by any code that is not respecting the rules or even a bug in Qt itself.

Contributor

Be-ing commented Mar 19, 2018

Since the immediate issue of the GUI freezing during batch analysis was fixed in #1485, I do not think we should rush this for 2.1.

uklotzde and others added 15 commits

March 31, 2018 14:46


          Merge branch '2.1' into lp1737537

34e811e


          Merge branch '2.1' of https://github.com/mixxxdj/mixxx.git into lp173…

…7537


          Merge branch '2.1' of https://github.com/mixxxdj/mixxx.git into devel…

b3596d3

…opment

# Conflicts:
#	src/mixer/playermanager.cpp


          Merge branch '2.1' into lp1737537

8502b90


          Fix invocation args of pasuspender

17f3ef6


          Merge branch '2.1' of git@github.com:mixxxdj/mixxx.git into lp1737537

29a7711


          Fix remaining invocations with of pasuspender


          fix debug assert using empty pixmap source

cc34144


          Fix false postive restart request when changing skin.

920a00b


          Fix unresponsive scrolling through crates & playlists with encoder

360f988


          Add Gemini to USB rules list

bc0bd79


          Merge pull request #1616 from uklotzde/lp1763007_pasuspender_usage

4eb23c3

lp1763007: Fix invocation args of pasuspender


          Merge pull request #1619 from uklotzde/lp1719474_fix_scrolling_throug…

34b0a82

…h_crates_and_playlists

lp1719474: Fix unresponsive scrolling through crates & playlists using encoder


          Merge pull request #1618 from daschuer/211bugfixes

cfe067f

debug assert fix + lp745679 false poisitiv restart request


          Merge branch '2.1' into lp1737537

c0448f4

Contributor Author

uklotzde commented Apr 17, 2018

I'll keep this PR aligned with and targeted to 2.1 rather than master. It contains so many fixes we might need at some time for a 2.1.x fix release 😉

Contributor

Be-ing commented Apr 18, 2018

I'm not comfortable merging a +2,100 −1,078 pull request into a released branch unless it solves a critical bug that makes Mixxx unusable. There will be bugs from this and it needs wider testing before going into a release. Let's target it to master.

Be-ing mentioned this pull request

Multithreaded analysis #1069

Closed

Be-ing changed the base branch from 2.1 to master

April 18, 2018 05:18

Contributor Author

uklotzde commented Apr 18, 2018

This will not be an issue until serious merge conflicts arise.

I would like to reap the benefits of this work for my personal, customized, and stable 2.1 version. Anyone is free to fork this branch at any time and maintain it by themselves. But whoever decides to do so takes all the responsibilities. There is no such thing as a free lunch 😉

Contributor Author

uklotzde commented Apr 18, 2018

I think about creating a new branch without all the intermediate changes and dead ends. The change set is much bigger than it ought to be. Although the history of commits and comments give some insights about where we got here finally.

Contributor

Be-ing commented Apr 18, 2018

I would like to reap the benefits of this work for my personal, customized, and stable 2.1 version.

Of course, you can do whatever you want on your own computer. But I am not comfortable merging such a big branch into the upstream 2.1 branch. Splitting this into smaller branches would be nice.

uklotzde mentioned this pull request

Analysis Revamp #1624

Merged

uklotzde closed this

uklotzde deleted the lp1737537 branch

July 8, 2018 19:38

This was referenced Aug 23, 2022

crash when trying to analyze an empty file #8998

Closed

analyzing file causes sluggish gui #9005

Closed

gui unresponsive after longer runtime #9014

Closed

Analyze duplicating tracks, sometimes #9107

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels