Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert an existing Stream into deluge #10

Open
arlyon opened this issue Jan 11, 2023 · 5 comments
Open

Convert an existing Stream into deluge #10

arlyon opened this issue Jan 11, 2023 · 5 comments

Comments

@arlyon
Copy link

arlyon commented Jan 11, 2023

Hi!

I have a stream that I cannot control that I would like to operate over in parallel. What options do I have here?

Cheers

Alex

@mkawalec
Copy link
Owner

Hey @arlyon! A Stream would only give out a single evaluated element at a time (see the poll_next method on the Stream). This means that it's construction itself blocks us from evaluating it in parallel. If you want to do further processing that itself is async, I would suggest collecting the stream into a vector, using into_deluge on that vector and performing some async transformations on the resultant Deluge in a similar fashion to what the docs show.

Is this helpful?

@arlyon
Copy link
Author

arlyon commented Jan 13, 2023

I was imagining something like rayon's par_bridge which I assume eagerly collects n items and distributes them across n threads. Less efficient than starting with a collection in memory, but is able to handle 'very large' datasets much easier. Streams in particular seem useful for this because, in may case, I am receiving and opening a list of AsyncRead streams of unknown length and want to process all of them while reducing the number of file descriptors I have open. I suppose I could collect all the file names from walking the filesystem first, but for I would like to avoid the overhead for very large lists of files.

https://docs.rs/rayon/latest/rayon/iter/trait.ParallelBridge.html

@mkawalec
Copy link
Owner

Good idea, I'll add something similar to the todolist, that converts a Stream into a Deluge without materializing the whole intermediate dataset. Of course you are encouraged to contribute it yourself if you have the capacity.

@mkawalec
Copy link
Owner

Closing because I don't believe there is a straightforward way to convert a stream into a Deluge given that a Deluge needs to know how many elements it is going to have when it is being created. But I'll keep on thinking about it :)

@mkawalec
Copy link
Owner

I just realized I can use StreamExt::chunks for this exact purpose, hiding it behind a conversion api :D

@mkawalec mkawalec reopened this Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants