-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the reliance on PATH_MAX
#27930
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
} | ||
|
||
let new_cap = buf.capacity() * 2; | ||
buf.reserve(new_cap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the extra space to reserve, so this may be growing a little more rapidly than anticipated. Like with read_to_end
this can probably just call buf.reserve(1)
after setting the length of the capacity to let the internal doubling logic take over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of buf
remains 0 until the end, so I think buf.reserve()
will work as expected in this case. The problem is, as you said, Vec::reserve
has the internal doubling logic, so the capacity will grow as in N, 4N, 16N, 64N, ...
.
The recently added getcwd has a similar logic but it uses an external variable to track the desired capacity, so the capacity being four times per each iteration will not happen. However, every one of two Vec::reserve
calls will be no-op because the previously increased capacity will be sufficient to meet the new requirement. This also means every one of two system calls is wasted.
Correct me if I'm wrong, but the behavior of read_to_end_uninitialized
seems to be a bit misleading too. What it will do is N, 2(N+1), 2(2(N+1))+1, ...
, because the internal doubling is applied after the capacity is increased by one. Is this an expected behavior? I don't know much about the allocator, but I guess the capacity remaining 2^N
seems to be ideal. The behavior does not follow this.
There seems to be a more appropriate function for this purpose: Vec::reserve_exact
. It has the almost identical behavior except it does not double the capacity internally, so it would be useful if the user is assured to provide an appropriate new capacity and also wants to prevent excessive memory allocation. What do you think of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually because the length of the buffer isn't changing here (nor in the getcwd
case) it's actually working as anticipated. The reserve
function reserves space for n
extra elements, but because the length of the vector is 0 it works out that you're just specifying the capacity.
In read_to_end_uninitialized
the request is always rounded up to the next power of two so it ends up just doubling the vector whenever it reallocates. So long as the len
is updated to be the same as the capacity in this (and possibly getcwd
too) then reserve(1)
should be all you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
read_to_end_uninitialized
the request is always rounded up to the next power of two so it ends up just doubling the vector whenever it reallocates.
Actually, what I said above is that it does not work like that! For example, if I add println!("{:?}", buf.capacity())
right after the buf.reserve(1)
statement in the function, it outputs:
32
66
134
270
542
1086
2174
It doesn't produce the powers of 2, because 1 is added before doubling the capacity. And that's why I say it doesn't seem to be ideal. I believe the modern allocators tend to work well when the requested space is the powers of 2, so the new capacity should be adjusted to a power of 2 if possible, but read_to_end_uninitialized
doesn't seem to follow that. Is my observation correct?
Therefore my point is, to truly get a new capacity with a power of 2, one might have to use reserve_exact
or something similar. If the behavior of read_to_end_uninitialized
is unintended, it might also have to be modified to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I guess that makes sense. In any case, however, these should all follow the same strategy instead of reinventing the wheel. Let's have them all take the reserve(1)
route to punt to Vec
as much as possible for figuring out its reallocation strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, sorry for asking this again, but there's still a remaining concern. As the length is not actually set to the capacity at each iteration currently, calling buf.reserve(1)
will be a no-op in my code. Do you think the length should be updated every time before calling reserve
? Something like:
loop {
let buf_read = try!(readlink(...));
unsafe { buf.set_len(buf_read); } // This will be added
if buf_read != buf.capacity() {
unsafe { buf.set_len(buf_read); } // This should probably be removed
buf.shrink_to_fit();
return Ok(...);
}
buf.reserve(1);
}
That feels a bit odd to me because unlike read_to_end_uninitialized
, we abandon the buffer immediately if readlink
fails. But if you're OK with the code above, I'll follow that.
Plus, do you think getcwd might also have to be modified to use reserve(1)
? This may have to be another PR, but if you like it, I can add another commit to address this.
Sorry for disturbing you, this shall be my last review request!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sort of loop is fine to do, and it's fine by me if you want to share an implementation between getcwd
and readlink
. Thanks!
dcc456b
to
15d0322
Compare
I addressed all of the comments and added another commit to modify |
@@ -30,7 +30,7 @@ use sys::c; | |||
use sys::fd; | |||
use vec; | |||
|
|||
const GETCWD_BUF_BYTES: usize = 2048; | |||
const GETCWD_BUF_BYTES: usize = 512; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to just go ahead and delete this constant and inline its value, I'm not sure that it's necessarily buying us a whole lot up here
Looks good to me, thanks @barosl! r=me after a few small nits and a squash |
`_PC_NAME_MAX` is necessary to use `libc::pathconf`. Its value is fixed to 3 currently, but actually it varies with the platform. * _PC_NAME_MAX == 3 Linux (glibc): https://sourceware.org/git/?p=glibc.git;a=blob;f=bits/confname.h;h=1c714dfbf9398b8a600f9b69426a7ad8c7e89ab4;hb=HEAD#l32 NaCl (newlib): https://chromium.googlesource.com/native_client/nacl-newlib/+/373135ec5241d09138aa56603742b94b3b64ea1d/newlib/libc/include/sys/unistd.h#430 * _PC_NAME_MAX == 4 Android (Bionic): https://github.com/android/platform_bionic/blob/7e919daeaad62515ebbbf7b06badc77625a14d90/libc/include/unistd.h#L59 FreeBSD: https://svnweb.freebsd.org/base/head/sys/sys/unistd.h?revision=239347&view=markup#l127 NetBSD: http://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/sys/unistd.h OS X: http://opensource.apple.com/source/xnu/xnu-2782.10.72/bsd/sys/unistd.h This commit fixes this, and also addes the `_PC_PATH_MAX` constant needed by further commits.
Done. What commits do you think should be squashed? I feel a sense that the changes to |
Yeah it's fine to leave libc separate but feel free to squash the other four, so long as the commit message makes sense when taken in isolation I'm fine with it. Thanks! |
- Rewrite `std::sys::fs::readlink` not to rely on `PATH_MAX` It currently has the following problems: 1. It uses `_PC_NAME_MAX` to query the maximum length of a file path in the underlying system. However, the meaning of the constant is the maximum length of *a path component*, not a full path. The correct constant should be `_PC_PATH_MAX`. 2. `pathconf` *may* fail if the referred file does not exist. This can be problematic if the file which the symbolic link points to does not exist, but the link itself does exist. In this case, the current implementation resorts to the hard-coded value of `1024`, which is not ideal. 3. There may exist a platform where there is no limit on file path lengths in general. That's the reaon why GNU Hurd doesn't define `PATH_MAX` at all, in addition to having `pathconf` always returning `-1`. In these platforms, the content of the symbolic link can be silently truncated if the length exceeds the hard-coded limit mentioned above. 4. The value obtained by `pathconf` may be outdated at the point of actually calling `readlink`. This is inherently racy. This commit introduces a loop that gradually increases the length of the buffer passed to `readlink`, eliminating the need of `pathconf`. - Remove the arbitrary memory limit of `std::sys::fs::realpath` As per POSIX 2013, `realpath` will return a malloc'ed buffer if the second argument is a null pointer.[1] [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/realpath.html - Comment on functions that are still using `PATH_MAX` There are some functions that only work in terms of `PATH_MAX`, such as `F_GETPATH` in OS X. Comments on them for posterity.
Make `std::sys::os::getcwd` call `Vec::reserve(1)` followed by `Vec::set_len` to double the buffer. This is to align with other similar functions, such as: - `std::sys_common::io::read_to_end_uninitialized` - `std::sys::fs::readlink` Also, reduce the initial buffer size from 2048 to 512. The previous size was introduced with 4bc26ce in 2013, but it seems a bit excessive. This is probably because buffer doubling was not implemented back then.
Done, squashed into three commits. |
This PR rewrites the code that previously relied on `PATH_MAX`. On my tests, even though the user gives the buffer length explicitly, both Linux's glibc and OS X's libc seems to obey the hard limit of `PATH_MAX` internally. So, to truly remove the limitation of `PATH_MAX`, the related system calls should be rewritten from scratch in Rust, which this PR does not try to do. However, eliminating the need of `PATH_MAX` is still a good idea for various reasons, such as: (1) they might change the implementation in the future, and (2) some platforms don't have a hard-coded `PATH_MAX`, such as GNU Hurd. More details are in the commit messages. Fixes #27454. r? @alexcrichton
For Bitrig, NetBSD and OpenBSD the constant was incorrectly in posix01, when it's actually posix08, so we move it. This is a [breaking-change], but we already had one in rust-lang#27930. Fix NetBSD's F_DUPFD_CLOEXEC constant. For a similar feature detection, see this musl thread: http://comments.gmane.org/gmane.linux.lib.musl.general/2963 This assumes that an int literal has type `c_int` for varidic functions.
This PR rewrites the code that previously relied on
PATH_MAX
.On my tests, even though the user gives the buffer length explicitly, both Linux's glibc and OS X's libc seems to obey the hard limit of
PATH_MAX
internally. So, to truly remove the limitation ofPATH_MAX
, the related system calls should be rewritten from scratch in Rust, which this PR does not try to do.However, eliminating the need of
PATH_MAX
is still a good idea for various reasons, such as: (1) they might change the implementation in the future, and (2) some platforms don't have a hard-codedPATH_MAX
, such as GNU Hurd.More details are in the commit messages.
Fixes #27454.
r? @alexcrichton