Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't capture synchronization context in DiskWriterQueue #2447

Merged

Conversation

alexbereznikov
Copy link

This addresses #2446

Task.Factory.StartNew tries to use task scheduler from the current task, so TaskScheduler.Default needs to be passed here.

Copy link
Member

@pictos pictos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the value of not capture the syncronization context, but this will be a breaking change. From the linked issue, it's caused because your project uses a custom SyncronziationContext, but that doesn't happen on UI frameworks, that implements its own syncContext, so I that doesn't prove that the issue is in your SyncContext implementation and not on LiteDb?

@alexbereznikov
Copy link
Author

@pictos As I described in related issue (#2446), breaking change was introduced after 5.0.17 - Task.Run (which uses default task scheduler) has been changed to Task.Factory.StartNew (that uses a current task scheduler by default). More precisely, breaking changes were made in this commit - f21cd84

So this PR aims to revert this breaking change, not to introduce one.

Regardless of whether our custom context is implemented correctly, I don't think it is a good idea to execute writer queue on a captured context (e g UI context). I would say, it rather highlights the issue.

@pictos
Copy link
Member

pictos commented Jun 6, 2024

So, looking more closely to the implementation I strongly believe that the issue is more than having it capturing the current task's TaskScheduler.
The task is being created with LongRunning flag, and uses an async delegate, and you may know that those two doesn't fit togheter.

So the first thing we should change here is to make the ExecuteQueue a void method, and block the created thread on the async operations or rethink all this engine, which I believe will not be done on this PR if we choose this path.

edit: Did a PoC using the main implementation and it will not flow the UI context, adding the TaskScheduler or not will not change the behavior, will just go throw UI thread if you pass TaskScheduler.FromCurrentSynchronizationContext(), so I would say that your PR will not fix the issue at the end

Copy link
Member

@pictos pictos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking with @mbdavid, I could understand the idea behind this member. So the fix is something completely different from your implementation.

No matter if you pass the TaskScheduler as parameter or not, it will not run on the UI Thread. The fix is making the ExecuteQueue method a void method and all the implementation should be a block code, that's not use any asynchronous API.

Also, for improvement, remove the lock object on the task instantiation and use an interlockedExchange method to verify if there's a need to create a task.

With that the code will behave as expected, it will run on a deticated thread and will not touch any kind of syncContext, taskScheduler or threadPool Thread.

Let me know if you want to perform these changes, otherwise I'll do it. And thanks for your hard work so far❣️

@alexbereznikov
Copy link
Author

@pictos I saw this discussion (until I was removed from #moderators channel :)). Yes this could solve the issue as well, but what are real benefits of having a separate thread for this? I mean, this is not a like windows message loop thread, where we have to keep the same thread id. From my point of view, having a separate thread here have some drawbacks:

  • Having to manage this thread
  • Additional resources are required - at least 1Mb for the thread stack
  • Loosing ability to use async primitives - roughly half of the time the thread will do nothing while waiting for event

I agree that LongRunning probably is redundant here, we can just as well revert to Task.Run.

Regarding PoC - I couldn't reproduce the issue in a small PoC either, but I'm 100% sure that our integration tests started to fail because of this regression, and this fix fixed this for us (we have a fork with some fixes applied).

@pictos
Copy link
Member

pictos commented Jun 7, 2024

@alexbereznikov, let me answer your comments in pieces.

Yes this could solve the issue as well, but what are real benefits of having a separate thread for this? I mean, this is not a like windows message loop thread, where we have to keep the same thread id

First of all, we don't need to manage the dedicated thread, that BCL will do it for us, since we're using the Task.Factory.StartNew. So, we don't need to manage the thread, we are handling with the Task, which is a high-level abstraction.

Loosing ability to use async primitives - roughly half of the time the thread will do nothing while waiting for event

We don't need the async primitives in this specific case. We want to have a dedicated thread to handle the DiskWriter work.

I agree that LongRunning probably is redundant here, we can just as well revert to Task.Run.

I didn't say the LongRunning is redundant, I said that passing the TaskScheduler as parameter is redundant. And we can't use Task.Run here, because that way we will block a ThreadPool thread to do a hard work, and that's wrong by design. The threads on the ThreadPool aren't designed to do a heavy work like that. That said, the usage of the Task.Factory.StartNew is a good approach here.

Regarding PoC - I couldn't reproduce the issue in a small PoC either, but I'm 100% sure that our integration tests started to fail because of this regression, and this fix fixed this for us (we have a fork with some fixes applied).

Friendly saying. And still there you're confident that the custom SyncronizationContext implemented on your tests doesn't have an issue?

@alexbereznikov
Copy link
Author

alexbereznikov commented Jun 8, 2024

@pictos Ok, we do use sync I/O operations here and thread pool is really not the best choice in this case.

But, what if we change WritePageToStream to use WriteAsync instead of Write, and FlushToDisk extension to use FlushAsync instead of Flush? In this case, we won't block thread pool thread to do a hard work - its responsibility will be pass pages from queue to _stream.WriteAsync. Also, thread won't wait on WaitAsync.

Regarding custom SyncronizationContext - it works perfectly fine, it is not a "test" context, we do use it in the main app. The only issue it has is that it is disposable, and can perform only one operation at a time. So the issue we had initially was when continuation from ExecuteQueue got posted to already disposed synchronization context.

@pictos
Copy link
Member

pictos commented Jun 8, 2024

@alexbereznikov, why are so against to use a dedicated thread for it? It's the right tool this job.

For this PR at this time we have plan to not change how it works in core, so I'll follow the Mauricio's recommendations. And feel free to open an issue proposing this change, if it's approved we can merge it in the next major release.

@alexbereznikov
Copy link
Author

@pictos I just feel that it is a step in a backwards direction. I don't think that changing Write to WriteAsync can break anything.

Regarding custom sync context and PoC - please take a look at this - https://github.com/alexbereznikov/LiteDB/tree/2446-poc

Here I added a new console project called Poc, which uses a custom synchronization context and a task scheduler created from it. It outputs all the stack traces for each call to Send/Post, and it is clear that ExecuteQueue calls got redirected to this context.

Context itself does not flow, for sure

@pictos
Copy link
Member

pictos commented Jun 10, 2024

@alexbereznikov I've a feeling that you aren't listening to me and just want to make your point pass. What you did on your PoC just confirms what I said here, what in resume is "the only way that your custom SyncContext will flow to the custom Task is if you call TaskScheduler.FromSyncronizationContext(). Passing TaskScheduler.Default or omitting the call will not touch on the SyncContext, no matter what's".

As peer discussion with @mbdavid, for now we will not change the approach. But feel free to open an issue where you can put more info and we can discuss another approach for this, but remember it must be ThreadSafe.

@alexbereznikov
Copy link
Author

@pictos But in this case Task.Factory.StartNew uses a current task scheduler, which in turn uses that context. I don't think it is a correct behavior, because in this PoC I didn't change anything in LiteDB sources, i e haven't passed this TaskScheduler.FromSyncronizationContext() when starting ExecuteQueue. And passing TaskScheduler.Default fixes this issue.

@alexbereznikov
Copy link
Author

I'm honestly surprised a lot that such a simple one-line regression fix spawned that much discussion. If you insist I'll do in the way you proposed because I have no other options :) But I'm still 100% sure that this is a step in a backwards direction - instead of fixing the issue and moving on we're rolling back to ways it was done a while ago.

@pictos
Copy link
Member

pictos commented Jun 10, 2024

@alexbereznikov, let's try one more time... Today that API is sensitive to the SyncContext, so can be devs that rely on that. Changing this behavior is a breaking change that we can't do in a SR.

So, for this PR, that will be released in a SR we need to fix the fact that right now the task is leaking to the ThreadPool. For changing what you want, we will need to do a new PR for that and also verify on the code place other places that are sensitive to SyncContexts and change that, hope that now you understand.

Thanks for your help and bringing this to light. I did the necessary changes for created #2505 to follow up the next changes. If you want to take a lead on that and change all places on our code base to be SyncContext agnostisc let us now and we will assign the issue to you.

LiteDB/Engine/Disk/DiskWriterQueue.cs Show resolved Hide resolved
@pictos pictos dismissed JKamsker’s stale review June 12, 2024 15:16

The changes pointed will be done in another PR

@pictos pictos merged commit e9cff85 into litedb-org:master Jun 12, 2024
1 check passed
@alexbereznikov alexbereznikov deleted the 2446-fix-disk-writer-queue-context branch June 13, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants