Skip to content
This repository has been archived by the owner on May 4, 2018. It is now read-only.

streaming readdir() #931

Closed
olsonjeffery opened this issue Sep 17, 2013 · 10 comments
Closed

streaming readdir() #931

olsonjeffery opened this issue Sep 17, 2013 · 10 comments

Comments

@olsonjeffery
Copy link

The current uv_fs_readdir API returns all entries in a directory, at once, in a char* ptr field on uv_fs_t. This is problematic in the case of a directory with lots of files allocating a large amount of memory on every request as well as for perf.

We (for rust, but others as well I'm sure) would like some kind of streaming/paging approach to directory listing.

I have no idea what this API would look like. Thoughts?

@piscisaureus
Copy link

The current uv_fs_readdir API returns all entries in a directory, at once, in a char* ptr field on uv_fs_t. This is problematic in the case of a directory with lots of files allocating a large amount of memory on every request as well as for perf.

I agree.

We (for rust, but others as well I'm sure) would like some kind of streaming/paging approach to directory listing.

This is indeed a longstanding issue in node: nodejs/node-v0.x-archive#388

I have no idea what this API would look like. Thoughts?

Neither do I, you're free to propose one :)

It might be worthwhile to report a little bit more than just filenames. Windows can do this with NtQueryDirectoryFile, and I think on linux we could use getdents(2), but I'd like @bnoordhuis to weigh in on that.

@olsonjeffery
Copy link
Author

Correct me if I'm wrong, but the uv_fs_* APIs are interesting in that they only use a uv_fs_t request and each API call's stateless is contained in that single uv_fs_t..

I suppose that, with some kind of multi-call/streaming API, this would imply a uv_readdir_t (or uv_scandir_t? Heh, I saw the comment in src/unix/fs.c...) handle?

@bnoordhuis
Copy link
Contributor

It might be worthwhile to report a little bit more than just filenames. Windows can do this with NtQueryDirectoryFile, and I think on linux we could use getdents(2)

opendir() / readdir() / closedir() is more portable. Either way, the kernel doesn't report anything besides the filename and the inode number.

@saghul
Copy link
Contributor

saghul commented Sep 4, 2014

uv_fs_readdir_next (http://docs.libuv.org/en/latest/fs.html#c.uv_fs_readdir) tried to address this, but even if the output looks paged, internally all dirents are stored in a big fat array. I'm not aware of a mechanism to make this "streaming"-like all the way down.

@bnoordhuis
Copy link
Contributor

I'm not aware of a mechanism to make this "streaming"-like all the way down.

I can't speak for Windows but why don't you use readdir_r() on UNIX?

That will require some changes though, I think. uv_fs_readdir_next() should take a list of uv_dirent_t structures and go through the thread pool. It should also copy out the d_name from the native struct dirent because that won't be stable over time.

I would not recommend making uv_dirent_t an alias for struct dirent because the layout of that struct may change depending on compilation flags.

@saghul
Copy link
Contributor

saghul commented Sep 4, 2014

I can't speak for Windows but why don't you use readdir_r() on UNIX?

That will require some changes though, I think. uv_fs_readdir_next() should take a list of uv_dirent_t structures and go through the thread pool. It should also copy out the d_name from the native struct dirent because that won't be stable over time.

Aha, I (or maybe @indutny) will look into that.

I would not recommend making uv_dirent_t an alias for struct dirent because the layout of that struct may change depending on compilation flags.

Sure.

Thanks for the comments Ben.

@aseemk
Copy link

aseemk commented Oct 8, 2014

Ran into this problem this week (trying to process a directory with >1M files), and felt this pain, so would love to see this happen.

I'm very ignorant on this topic, but nodejs/node-v0.x-archive#388 (comment) seems encouraging. =)

@misterdjules
Copy link

@aseemk I started working on an implementation of a stream API in issue #1521. There's also more discussion about this in issue #1430. You might want to subscribe to these issues.

@misterdjules
Copy link

@saghul Should this be closed in favor of #1430 to avoid duplication?

@saghul
Copy link
Contributor

saghul commented Oct 10, 2014

Yeah, closing!

@saghul saghul closed this as completed Oct 10, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants