New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

lp1842679: Fix caching of audio data #2265

Merged

daschuer merged 13 commits into mixxxdj:2.2 from uklotzde:cachingreader_chunks

Sep 15, 2019

Contributor

uklotzde commented Sep 4, 2019 •

edited

Loading

https://bugs.launchpad.net/mixxx/+bug/1842679

~~Trying to mitigate the issue by reserving more chunks. The current number of chunks seems to be insufficient when using multiple decks including sample decks!~~ Reverted, my initial guess was wrong. But I found issues with the double-linked MRU/LRU list, at least some assertions failed. Fixed.

The comment was outdated, I updated it.


          Increase number of cached chunks

4c8c985

uklotzde added the critical bug label

uklotzde added this to the 2.2.3 milestone


          Reduce and attach size of FIFOs to number of cached chunks

0c663d1

Member

daschuer commented Sep 5, 2019

I have some doubts if this fixes the linked bug. Do we have an idea what need to happen that none of the 80 chunks can be freed?

Contributor Author

uklotzde commented Sep 5, 2019

It will not fix the bug if something else is blocking the processing of chunks and they get stuck in one of the FIFOs.

Nevertheless, it doesn't hurt to increase the number, 16 MB should be acceptable. And the size of the FIFOs should match exactly this number! This is fixed now.

Contributor Author

uklotzde commented Sep 5, 2019

The logs attached to the bug report clearly show a strong timely correlation of enabling recording and running out of chunks. Everything back to normal after disabling recording. Something in the recording path of the engine seems to block and cause a chain reaction.

Member

daschuer commented Sep 5, 2019

https://github.com/uklotzde/mixxx/blob/0c663d1729c0b0c3ddb6fdf4309389c36740ae5a/src/engine/cachingreader.cpp#L134

Why can this ever be true? If I understand the code correct, lru and mru are the tail and head it the linked list in use. If all 80 chunks are in used, lru and mru must have a value and hold all chunks.

So maybe the recording triggers as bug that looses the lru?

Contributor Author

uklotzde commented Sep 6, 2019

https://github.com/uklotzde/mixxx/blob/0c663d1729c0b0c3ddb6fdf4309389c36740ae5a/src/engine/cachingreader.cpp#L134

Why can this ever be true? If I understand the code correct, lru and mru are the tail and head it the linked list in use. If all 80 chunks are in used, lru and mru must have a value and hold all chunks.

So maybe the recording triggers as bug that looses the lru?

If all allocated chunks are in on their way between CachingReaderWorker and CachingReader both head/tail pointers of the MRU/LRU list will become nullptr. See CachingReaderChunk::removeFromList()/insertIntoListBefore() and the FIFOs in CachingReader that serve as channels.

Contributor Author

uklotzde commented Sep 6, 2019

If CachingReaderWorker got stuck, probably due to some file system I/O, then we will eventually run out of chunks if CachingReader receives requests, but doesn't get back any chunks from the worker and all chunks are in the state READ_PENDING.

This should only happen for files > 5 MB, our current limit.

uklotzde changed the title ~~lp1842679: Increase number of cached chunks~~ [WiP] lp1842679: Increase number of cached chunks

uklotzde added 6 commits

September 6, 2019 16:33


          Add debug assertions to detect memory-leaks in cache

09e96b8


          Simplify double-linked list removal

a66af2e


          Ensure complete initialization of caching worker update messages

b05a718


          Use default destructors and std::move

5c89e8b


          Fix management of MRU/LRU chunk list

510a871


          Add notes about how to test the caching of audio data

cc2c242

uklotzde changed the title ~~[WiP] lp1842679: Increase number of cached chunks~~ lp1842679: Increase number of cached chunks

Contributor Author

uklotzde commented Sep 6, 2019 •

edited

Loading

Looks like there were indeed some(?) bugs in the double-linked list management! I experienced assertion errors when reducing the total number of chunks to just 1, 2, or 3 for testing all edge cases.

Now it should be super-safe. The code has been cleaned up and is guarded by lots of assertions. I did not experience a single issue (apart from massive drop-outs and warning logs) when shrinking the cache to minimum capacity (see comment in last commit).

I have not analyzed if and how blocking could occur during recording.

uklotzde changed the title ~~lp1842679: Increase number of cached chunks~~ lp1842679: Increase number of cached chunks / fix caching

Member

daschuer commented Sep 6, 2019

Thank you for double checking the list functions.

I think the audio dropouts itself are caused by the locking debug messages.
Maybe we can add a flag to issue only one during a Mixxx run?
We did this with other engine debug messages as well.

Due to all the Samples, the increased number of chunks highers the memory footprint from
345 MB to ~1 GB. If this leads to storing memory on the swap partition in low end systems, all the caching efforts are for nothing.

Without having a closer lock why the Caching Reader Worker is locked I guess this is related to looking disk IO which we cannot avoid. So It is probably a good idea to deal with it. In case the worker is locked all chunks are stored on the fifio until it is full. After unlocking the worker, it processes a probably outdated chunks first.

Normally there are ~6 chunks requested per callback and normally only one chunk changes. All the other 72 chunks are not interesting except fro random seeks we cannot predict anyway.

It will took a while until the reader recovers from an overflow. If this happens with two decks, they likely lock each others again.

From this point of view it would be better to limit the worker fifo to a reasonable low number and flush it in case of overflow. This way, likely required chunks are read after recovering instead of old unwanted chunks. We may also reduce the over all number of chunks.

What do you think?

Contributor Author

uklotzde commented Sep 6, 2019

Cache size
Oh, I forgot that we have one reader per deck 🙈 Then we should indeed reduce the number of chunks to the original or a lower amount.

Debug logs
Thanks for the hint! Leaving the debug logs enabled in production was never a good idea. This causes all kinds of issues as we see here once again. Adding a flag for every single debug log is error-prone. Then I prefer to delete them entirely. I will check how many are affected.

FIFOs
Messing around with the FIFOs is difficult, the whole design is still fragile. We must take into account ownership between reader and worker and that no chunks get lost. Let's leave this as it is. I'm not keen on spending hours or days on this again.

uklotzde added 3 commits

September 6, 2019 21:24


          Clean up code and logs in CachingReaderWorker

dc92cad


          Reduce number of allocated chunks and leave a note

8ba0de6


          Adjust logs in CachingReader

56625eb

uklotzde changed the title ~~lp1842679: Increase number of cached chunks / fix caching~~ lp1842679: Fix caching of audio data

Member

daschuer commented Sep 6, 2019

Leaving the debug logs enabled in production was never a good idea.

We should use them wisely. Imagine if we did not have the message here that we would probably have a hard time solving the problem.

Messing around with the FIFOs ...

What if we just reduce the FiFo size to smaller number a number? ~20?

Contributor Author

uklotzde commented Sep 6, 2019

We must not reduce the size of the back channel m_readerStatusFIFO, because the worker uses writeBlocking()! Reducing the size of the request FIFO would drop incoming requests when full.


          Limit the number of pending, in-flight read request (FIFO)

77b9013

Contributor Author

uklotzde commented Sep 6, 2019

Done with very detailed comments.


          Fix blocking write for FIFO

69d2414

Contributor Author

uklotzde commented Sep 6, 2019

OMG, FIFO::writeBlocking() only works for count = 1!!

Contributor Author

uklotzde commented Sep 6, 2019

boost::lockfree::queue is not ideal, but would work. After a failed bounded_push() the CachingReader could invoke pop() (multi-consumer) to free the slot with the oldest request and retry. The 2nd bounded_push() attempt then must succeed.

Contributor Author

uklotzde commented Sep 6, 2019

But I'm not planning to integrate boost.

Member

daschuer commented Sep 15, 2019

Works good now. Hopefully it actually lints the user issue.
LGTM, Thank you.

daschuer merged commit 4000c66 into mixxxdj:2.2

Contributor Author

uklotzde commented Sep 16, 2019

@daschuer Thanks for merging these fixes! I was really concerned about the findings.

uklotzde deleted the cachingreader_chunks branch

September 20, 2019 11:05

Be-ing mentioned this pull request

Fix race condition(s) while unloading/loading tracks #2305

Merged

mixxxbot mentioned this pull request

2.2.2 Sound drops out during recording (ubuntu 18.04) #9732

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels