Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other way to handle blocking in streaming.rs #155

Closed
wants to merge 7 commits into from

Conversation

izderadicka
Copy link
Contributor

Hi,

this relates to #153 .

block_in_place - I've tried here - it's relatively straightforward - main...izderadicka:naive_streaming_fix

I also found a problem on e2e_streaming test - at least on my linux - it tried to create directory in root /does-not-matter, which failed due to permissions..

Here I tried approach with spawn_blocking - which I feel is more "appropriate". With blocking in event loop I think there can be problems if tokio runs in current thread executor. On the other hand I understand that reading from local file would be fairly quick (taking into account Linux kernel's fs caches etc.). So it might not be faster, as some additional allocation is involved, when task should run in different thread - this might be still bit optimized - here is first rough attempt.

}));
let have = poll_try_io!(self
.torrent
.with_chunk_tracker(|ct| ct.get_have_pieces()[current.id.get() as usize]));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason the waker was registered in the callback is it happens while the chunk_tracker lock is taken. Doing it outside the lock is harder to reason about - it's not obvious that this has no bugs, and needs to be proven - e.g. what happens in a race condition:

  1. you check for have and it's false
  2. another thread adds the piece and notifies all existing wakers
  3. you register a waker and it's never notified

Thus it's a race and need to move it back inside the lock

file_id = self.file_id,
"will write bytes"
);
loop {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think the previous implementation was much simpler to read and reason about. Not even saying it was simple! But I don't see how is it worth it in the end - the code is much harder and it requires allocations for the buffer, while previously it didn't.

The added complexity is a price we pay today for solving an issue that is purely theoretical.

@izderadicka
Copy link
Contributor Author

Hi,
thanks for comments, I put back register_waker in the callback, I did not realize it might be an issue, but now I think I understand the pattern.

Concerning the overall usability of the change, I'm not sure. I have done some quick dirty measurements (streaming down 1GB video with curl) and have not seen any notable performance difference.

The only difference is if block_in_place is used as simple workaround (as mentioned above) this indeed does not work in current thread executor (-s option). This proposed fix in current thread executor behaves OK, same as original code.

The code of this PR is more complex, that the feature of async code - I think there is no other way around, if you're composing futures "manually" you have do do all the low level stuff - polling etc.

On the other hand, it can be useful for future - I think that ideally Storage should be async trait ( which I think got stabilized, but I think there is still issue with async trait being a trait object - but this also will be solved in future). If storage will be async for reading then the streaming should be similar to proposed change.

So I leave it to your discretion, if this PR could be useful now.

@ikatson
Copy link
Owner

ikatson commented Jul 24, 2024

Looking at this again closer, I'm still hesitant to merge it because of unnecessary (at least at this time while Storage isn't async) complexity.

Dealing with the buffer / position etc was a bit tricky already, but this brings it to a whole next level - there's a lot of control flow and branching and just plain more code. Looking at it, it all seems reasonable, most of the complexity comes from writing Poll implementations manually being painful (albeit an enjoyable brain exercise).

As far as I understand, this is the main argument:

The only difference is if block_in_place is used as simple workaround (as mentioned above) this indeed does not work in current thread executor (-s option).

I should have expanded on it, but this was already encountered and worked around previously elsewhere in the codebase. You could just run the callback straight (without block_in_place) in single-thread mode - this was done already in the past a few times hence the BlockingSpawner helper in spawn_utils.rs. The spawner already gets created at session initialization, and then you can just copy it into the stream from ManagedTorrentInfo.spawner. Then instead of calling tokio::block_in_place you call spawner.spawn_block_in_place.

I think that ideally Storage should be async trait

That would be nice to try, but I fear it'll also make many things quite a bit more complicated (as this PR shows) and slower.

Hence I'm leaning towards keeping it simple and performant (both streaming.rs and not making storage async) with a couple strategic workarounds until (if ever) it becomes an issue.

@izderadicka
Copy link
Contributor Author

Hi,
I do understand your arguments, few comments:

  • async is slower - I'd say it depends - I have done some toy benchmarks (mentioned above) and has not seen any slowdown. Even your mentioned issue in Tokio - they finally have been able to achieve good performance, when utilizing more parallelization in Tokio. But true is that difference was small, so it probably does not justify the effort :-)
  • I'm not so big expert on system programing, so I do not have that gut feeling about what potentially blocking call is still OK in async context - so I tend to follow the "formal" rules here - that's why I rose this issue - now I'm bit more enlighten and thanks for your patience and explanations

So what I propose I will open new PR - with BlockingSpawner (I already saw this one) and fixed test - because that one was broken on my Linux box. I already have branch for it. And this one can stay parked for some future days. Agreed?

@ikatson
Copy link
Owner

ikatson commented Jul 25, 2024

Thanks @izderadicka for your patience! This sounds good.

As to async file IO being slower: if you look what it's doing under the hood, it's plain more code (instructions, allocations, inter-thread communication) than one syscall in blocking mode.

So taken in isolation it will be slower by definition unless you use smth like io-uring (which we don't).

Of course, if it blocks the executor and other unrelated tasks, this isn't great and may slow down the application as a whole. But in reality, if those reads are small, the disk is fast, and we workaround it using block_in_place to mark those places, it should be ok.

I'd say complexity is my primary concern, performance secondary, but still important.

All that said, if one finds a real problem with it, e.g. rqbit performing horribly with slow disks, then it might warrant refactoring IO to be async everywhere. I tested simulated slow disks recently though, and it was fine (obviously, only with multi-threaded executor).
Another real problem might be someone needing to use librqbit with single-threaded executor for some reason, and it performing badly because of slow disks. But that is hypothetical yet :)


Another thing with async is that after the recent refactor we need the storage trait, and all pread_all, pwrite_all will need to become boxed futures (because we use Box<dyn Storage> there's no other way).

That means allocating a future for every single read/write, and allocating the buffer for tokio::fs to work! While now there's none and it's as simple as it can get.


All that said, if someone finds an elegant way to address all these and not make a mess out of already convoluted code, I'm not against async file IO in principle.

@izderadicka
Copy link
Contributor Author

I'm closing this one - see at #157 instead. This one might be useful in some future, when storage will become async.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants