Other way to handle blocking in streaming.rs #155

izderadicka · 2024-07-21T19:17:11Z

Hi,

this relates to #153 .

block_in_place - I've tried here - it's relatively straightforward - main...izderadicka:naive_streaming_fix

I also found a problem on e2e_streaming test - at least on my linux - it tried to create directory in root /does-not-matter, which failed due to permissions..

Here I tried approach with spawn_blocking - which I feel is more "appropriate". With blocking in event loop I think there can be problems if tokio runs in current thread executor. On the other hand I understand that reading from local file would be fairly quick (taking into account Linux kernel's fs caches etc.). So it might not be faster, as some additional allocation is involved, when task should run in different thread - this might be still bit optimized - here is first rough attempt.

at leat inform executor about blocking call

ikatson · 2024-07-22T15:45:10Z

crates/librqbit/src/torrent_state/streaming.rs

-        }));
+        let have = poll_try_io!(self
+            .torrent
+            .with_chunk_tracker(|ct| ct.get_have_pieces()[current.id.get() as usize]));


the reason the waker was registered in the callback is it happens while the chunk_tracker lock is taken. Doing it outside the lock is harder to reason about - it's not obvious that this has no bugs, and needs to be proven - e.g. what happens in a race condition:

you check for have and it's false

another thread adds the piece and notifies all existing wakers

you register a waker and it's never notified

Thus it's a race and need to move it back inside the lock

ikatson · 2024-07-22T15:56:19Z

crates/librqbit/src/torrent_state/streaming.rs

-            file_id = self.file_id,
-            "will write bytes"
-        );
+        loop {


Overall, I think the previous implementation was much simpler to read and reason about. Not even saying it was simple! But I don't see how is it worth it in the end - the code is much harder and it requires allocations for the buffer, while previously it didn't.

The added complexity is a price we pay today for solving an issue that is purely theoretical.

izderadicka · 2024-07-23T20:49:57Z

Hi,
thanks for comments, I put back register_waker in the callback, I did not realize it might be an issue, but now I think I understand the pattern.

Concerning the overall usability of the change, I'm not sure. I have done some quick dirty measurements (streaming down 1GB video with curl) and have not seen any notable performance difference.

The only difference is if block_in_place is used as simple workaround (as mentioned above) this indeed does not work in current thread executor (-s option). This proposed fix in current thread executor behaves OK, same as original code.

The code of this PR is more complex, that the feature of async code - I think there is no other way around, if you're composing futures "manually" you have do do all the low level stuff - polling etc.

On the other hand, it can be useful for future - I think that ideally Storage should be async trait ( which I think got stabilized, but I think there is still issue with async trait being a trait object - but this also will be solved in future). If storage will be async for reading then the streaming should be similar to proposed change.

So I leave it to your discretion, if this PR could be useful now.

ikatson · 2024-07-24T21:00:47Z

Looking at this again closer, I'm still hesitant to merge it because of unnecessary (at least at this time while Storage isn't async) complexity.

Dealing with the buffer / position etc was a bit tricky already, but this brings it to a whole next level - there's a lot of control flow and branching and just plain more code. Looking at it, it all seems reasonable, most of the complexity comes from writing Poll implementations manually being painful (albeit an enjoyable brain exercise).

As far as I understand, this is the main argument:

The only difference is if block_in_place is used as simple workaround (as mentioned above) this indeed does not work in current thread executor (-s option).

I should have expanded on it, but this was already encountered and worked around previously elsewhere in the codebase. You could just run the callback straight (without block_in_place) in single-thread mode - this was done already in the past a few times hence the BlockingSpawner helper in spawn_utils.rs. The spawner already gets created at session initialization, and then you can just copy it into the stream from ManagedTorrentInfo.spawner. Then instead of calling tokio::block_in_place you call spawner.spawn_block_in_place.

I think that ideally Storage should be async trait

That would be nice to try, but I fear it'll also make many things quite a bit more complicated (as this PR shows) and slower.

Hence I'm leaning towards keeping it simple and performant (both streaming.rs and not making storage async) with a couple strategic workarounds until (if ever) it becomes an issue.

izderadicka · 2024-07-25T16:33:53Z

Hi,
I do understand your arguments, few comments:

async is slower - I'd say it depends - I have done some toy benchmarks (mentioned above) and has not seen any slowdown. Even your mentioned issue in Tokio - they finally have been able to achieve good performance, when utilizing more parallelization in Tokio. But true is that difference was small, so it probably does not justify the effort :-)
I'm not so big expert on system programing, so I do not have that gut feeling about what potentially blocking call is still OK in async context - so I tend to follow the "formal" rules here - that's why I rose this issue - now I'm bit more enlighten and thanks for your patience and explanations

So what I propose I will open new PR - with BlockingSpawner (I already saw this one) and fixed test - because that one was broken on my Linux box. I already have branch for it. And this one can stay parked for some future days. Agreed?

ikatson · 2024-07-25T16:49:36Z

Thanks @izderadicka for your patience! This sounds good.

As to async file IO being slower: if you look what it's doing under the hood, it's plain more code (instructions, allocations, inter-thread communication) than one syscall in blocking mode.

So taken in isolation it will be slower by definition unless you use smth like io-uring (which we don't).

Of course, if it blocks the executor and other unrelated tasks, this isn't great and may slow down the application as a whole. But in reality, if those reads are small, the disk is fast, and we workaround it using block_in_place to mark those places, it should be ok.

I'd say complexity is my primary concern, performance secondary, but still important.

All that said, if one finds a real problem with it, e.g. rqbit performing horribly with slow disks, then it might warrant refactoring IO to be async everywhere. I tested simulated slow disks recently though, and it was fine (obviously, only with multi-threaded executor).
Another real problem might be someone needing to use librqbit with single-threaded executor for some reason, and it performing badly because of slow disks. But that is hypothetical yet :)

Another thing with async is that after the recent refactor we need the storage trait, and all pread_all, pwrite_all will need to become boxed futures (because we use Box<dyn Storage> there's no other way).

That means allocating a future for every single read/write, and allocating the buffer for tokio::fs to work! While now there's none and it's as simple as it can get.

All that said, if someone finds an elegant way to address all these and not make a mess out of already convoluted code, I'm not against async file IO in principle.

izderadicka · 2024-07-27T06:28:36Z

I'm closing this one - see at #157 instead. This one might be useful in some future, when storage will become async.

izderadicka added 6 commits July 20, 2024 20:03

Fist naive attempt to improve streaming

1c9aa8c

at leat inform executor about blocking call

Fix e2e streaming test

4305d25

WIP on streaming

c670974

FileStream to run blocking read in blocking threads

e4a0078

Add .vscode to .gitignore

64812a9

Cleanup

59791e2

ikatson requested changes Jul 22, 2024

View reviewed changes

wake in context

b6573cc

izderadicka mentioned this pull request Jul 27, 2024

Add spawn blocking to FileStream #157

Merged

izderadicka closed this Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Other way to handle blocking in streaming.rs #155

Other way to handle blocking in streaming.rs #155

izderadicka commented Jul 21, 2024

ikatson Jul 22, 2024

ikatson Jul 22, 2024

izderadicka commented Jul 23, 2024

ikatson commented Jul 24, 2024

izderadicka commented Jul 25, 2024

ikatson commented Jul 25, 2024 •

edited

Loading

izderadicka commented Jul 27, 2024

Other way to handle blocking in streaming.rs #155

Other way to handle blocking in streaming.rs #155

Conversation

izderadicka commented Jul 21, 2024

ikatson Jul 22, 2024

Choose a reason for hiding this comment

ikatson Jul 22, 2024

Choose a reason for hiding this comment

izderadicka commented Jul 23, 2024

ikatson commented Jul 24, 2024

izderadicka commented Jul 25, 2024

ikatson commented Jul 25, 2024 • edited Loading

izderadicka commented Jul 27, 2024

ikatson commented Jul 25, 2024 •

edited

Loading