-
Notifications
You must be signed in to change notification settings - Fork 7.3k
fs.readdir callback is an array which is inefficient to process for large directories #388
Comments
I agree. |
Any thoughts about how we do this without breaking the existing API? |
No one says you have to iterate over the array in a Whenever you anticipate a large array, use a closure function to call in var fs = require('fs');
var clog = console.log;
var dir2read = '.';
fs.readdir(dir2read, function(err, flist){
if (err) {
clog('Error reading directory ' + dir2read);
clog(err);
return;
}
var elemNum = 0;
var ProcessEntry = function(entry){
clog('Processing a directory entry: ' + entry);
}
var DirIterator = function(){
ProcessEntry(flist[elemNum]);
elemNum++;
if (elemNum < flist.length) process.nextTick(DirIterator);
}
if (elemNum < flist.length) process.nextTick(DirIterator);
clog('.readdir() callback is finished, event loop continues happily...');
}); That is not an |
An async array-processing module is as simple as the following function: var AsyncArrayProcessor = function (inArray, inEntryProcessingFunction) {
var elemNum = 0;
var arrLen = inArray.length;
var ArrayIterator = function(){
inEntryProcessingFunction(inArray[elemNum]);
elemNum++;
if (elemNum < arrLen) process.nextTick(ArrayIterator);
}
if (elemNum < arrLen) process.nextTick(ArrayIterator);
} With this function, the above example of iterating over an array of directory entries takes the following form: var fs = require('fs');
var clog = console.log;
var dir2read = '.';
var AsyncArrayProcessor = function (inArray, inEntryProcessingFunction) {
var elemNum = 0;
var arrLen = inArray.length;
var ArrayIterator = function(){
inEntryProcessingFunction(inArray[elemNum]);
elemNum++;
if (elemNum < arrLen) process.nextTick(ArrayIterator);
}
if (elemNum < arrLen) process.nextTick(ArrayIterator);
}
fs.readdir(dir2read, function(err, flist){
if (err) {
clog('Error reading directory ' + dir2read);
clog(err);
return;
}
var ProcessDirectoryEntry = function(entry){
// This may be as complex as you may fit in a single event loop
clog('Processing a directory entry: ' + entry);
}
AsyncArrayProcessor(flist, ProcessDirectoryEntry);
clog('.readdir() callback is finished, event loop continues...');
}); As you may see now, there is nothing wrong with I suggest this issue to be closed. |
@sh1mmer I think |
@Mithgol, I think you may have misinterpreted the problem. It isn't that it's an array --- it's that it's returned all at once instead of streamed. Compare to the function this is named after, POSIX
This means memory contains only one directory entry at a time. With node's current @indutny, @sh1mmer - an event emitter makes sense, but an internal iterator would also work. Ignore the specific names, here's the pattern:
FWIW, I've seen this bug become an actual problem in two cases: traversing a large git object store and reading a maildir (in @clee's crankshaft mail client). |
+1 to @cbiffle's reasoning. |
Try defining var fs = require('fs');
var AsyncArrayProcessor = function (inArray, inEntryProcessingFunction) {
var elemNum = 0;
var arrLen = inArray.length;
var ArrayIterator = function(){
inEntryProcessingFunction(null, inArray[elemNum]);
elemNum++;
if (elemNum < arrLen) setTimeout(ArrayIterator, 0);
}
if (elemNum < arrLen) setTimeout(ArrayIterator, 0);
}
fs.eachChildOf = function (dirPath, entryFunction) {
/*
entryFunction(err, directoryEntry)
called as entryFunction(err) form is there's an error
otherwise called as entryFunction(null, directoryEntry)
once for each of the directory entries
asynchronously
*/
fs.readdir(dirPath, function(err, flist){
if (err) {
entryFunction(err);
return;
}
AsyncArrayProcessor(flist, entryFunction);
}
} Should work as desired… unless |
@Mithgol: That's exactly the problem. When you have a few thousand directory entries, |
see also joyent/libuv#1521 |
@indutny ... did this ever land anywhere? |
I don't think so, but there is nodejs/node#583, which is probably the better route. (Could be fed into a generator for iteration.) |
Agree... just not sure if this is something we should address in v0.12+ or push to the converged stream. Regardless, this looks like a feature-request as opposed to a bug. Will leave it open to track. |
Actually, on second thought. Let's close this. It's not likely something that would land here and the discussion going on in nodejs/node#583 is a better direction. Closing. |
fs.readdir is very problematic for listing directories if you don't know how large they are in advance. If you used a for loop to iterate over the array to output a directory listing (ex. directory index) and you list a directory with 1000 entries the whole process is blocked.
A DirectoryIterator should be provided instead that returns the directory list sorted and emits events like when reading lines from a file. It's easy to DOS a node HTTP server that implements directory listings by just requesting the listing of a large directory over and over again right now.
The text was updated successfully, but these errors were encountered: