-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose raw std{in,out,err} #148
Comments
How should this work on Windows? We currently convert from utf8 to utf16 when you write to the console. |
Any reason why that's bad? I don't see raw as meaning raw bytes stream, but rather no locking or buffering. If you want a raw byte stream, then you can grab the FD and convert it to a File. As a sidenote, I considered proposing implementing AsFd on Stdout (instead of the current StdoutLock) so you could manually convert it to a File, but that API ends up being really annoying to use since you need to use unsafe and wrap the file in a ManuallyDrop. Hence proposing letting you access raw stdout, but not the raw byte stream. |
If we don't buffer at all, it becomes difficult to ensure the conversion succeeds, we'd have to panic if your write is a truncated UTF-8 sequence, even if you're later going to write the rest of the sequence. For example, But if you want to write |
Interesting, I poked around and it turns out windows is double buffered: https://github.com/rust-lang/rust/blob/master/library/std/src/sys/windows/stdio.rs. I don't know how to word this properly then, but the proposal is to do the minimal amount of processing to still be considered a stdio stream as opposed to a byte stream. So windows would continue to be buffered for UTF-8 writes to work. |
None of the linked issues make a good argument for removing the mutex. The first one says that they're overkill for single-threaded programs. But on single-threaded programs a mutex lock-unlock is 1 uncontended atomic CAS and 1 uncontended atomic write, that's very cheap compared to the IO syscalls. The rest should be covered by making buffering switchable between line, block and no buffering. rust-lang/rust#78515 (comment) outlines the libs team's preferred path forward on that. |
Sure, but I can easily see someone wanting to use the kernel's locking: just print stuff to stdout and as long as it's a single write under the pipe buffer size, you'll get atomic writes (on Linux at least IIRC). Also to be clear, I'm not suggesting removing the lock on Stdout, I'm saying that Stdout won't ever be able to satisfy everyone because it's a tailored use case and therefore we should offer as unfettered access as possible while acting as stdout. |
I don't think std will ever be able to satisfy everyone. Nor do I think it has to. And if it's independent of the normal std I/O then I wonder why it needs to be in std rather than a crate? |
Note that StdoutRaw on windows isn't as 'unfettered' / 'raw' as one might expect. It still checks if the output is a console or not, and if it is, has a bunch of logic for using WriteConsoleW after converting the [u8] stream to UTF-16. |
Right, hence why I'd like to have this be in the stdlib. A crate would have to replicate all the craziness in https://github.com/rust-lang/rust/blob/8327047b23dc1eb150fdfb42177939388661eb6d/library/std/src/sys/windows/stdio.rs. I don't know much about windows, but I presume the stdlib is doing that for a reason and a crate would have to mirror the stdlib implementation. |
Can you describe use-cases, how the API should look like? How it should behave under concurrent access? Generally it is unclear what a reasonable "raw access" API would look like in a portable way or if non-portable solutions making more platform-specific guarantees would be preferred or ... |
As far as I understand this proposal, it's just like |
My proposal is super simple: make these APIs public.
Do nothing, let the OS and the user deal with it.
I'm not quite sure why the discussion is trying to prevent you from shooting yourself in the foot. Ignoring std{out,in,err} for a moment, I can very easily open two files to "foo.txt" and wreak havoc. That's a problem between me and the operating system. Getting back to std*, I could use OS APIs to dup those file descriptors and then mess with them outside the rust controlled environment, so exposing raw APIs doesn't change std* safety.
My use case is pretty simple: I have a single-threaded app that I want to be as small and efficient as possible. Currently, I have something like this that I call whenever I need to #[cfg(unix)]
{
use std::{fs::File, mem::ManuallyDrop, os::unix::io::FromRawFd};
let mut stdin = ManuallyDrop::new(unsafe { File::from_raw_fd(0) });
let mut stdout = ManuallyDrop::new(unsafe { File::from_raw_fd(1) });
(stdin, stdout)
}
#[cfg(not(unix))]
{
use std::io;
let mut stdin = io::stdin().lock();
let mut stdout = io::stdout().lock();
(stdin, stdout)
} |
Because the existing stdout types either guarantee synchronized access ( And then there's the windows console issue which makes the raw type thread-unsafe. So to uphold the guarantees of the existing types we need to do something else. Or we need to make them unsafe.
Ok and why not do that on unix too? |
I'm confused, how can this possibly be a guarantee when I can dup the stdout descriptor?
As in at the OS level or in the type? If it's in the type, shouldn't we be able to make it not send?
If I'm calling this as a helper method in a loop, now I have to think about the locking cost and try to write my code so the streams get passed in from the top level to avoid the loop. So maybe this is a bit OCD, but with raw APIs I wouldn't have to think about this since there's no cost to creating them (on unix). |
That is a good question. It used to require CC @sunfishcode
The type. And sure, we can do that, but that's exactly the kind of question that should be considered when proposing an API!
As I said earlier, uncontended locks are extremely cheap. Much cheaper than the IO calls. And if you're fighting for every single instruction then hoisting it out of the innermost loop should be enough, that's what the |
I think the problem with proposals that are essentially just "make internal type public" is that the internal type was not designed to be public. I think in cases like these it's better to treat it as if it's a completely new API and write out exactly the API surface. I don't think pointing to the current unix implementation is helpful (except as prior art) as it leaves a lot for people to guess at or speculate about. |
@the8472 As I understand it, the justification for the mutex is to protect the buffer. It's plain memory safety. I guess the rationale for There's an implied rule that libraries shall never touch Perhaps a hypothetical new raw stdio API could also look to this rule, and just carefully document the hazards, which already exist in Rust. Alternatively, I don't know how big of a scope Rust would be willing to consider here, but if you're talking completely new APIs... use std::io::{StreamReader, StreamWriter};
fn main(stdin: StreamReader, stdout: StreamWriter) {
// Wrap `stdin`/`stdout` in a `BufReader`/`BufWriter` if desired. And/or a `Mutex` if desired.
} No implied buffering or mutexes (except the minimum needed for UTF-8 I/O on Windows, following what's discussed above), and the implied rule becomes an explicit rule, only except that |
That might be the original motivation. But the documentation promises locking. People may rely on that for things.
It can still prevent interleaving of outputs! E.g. if you do json-lines based logging it's fine to have some non-json junk on some lines. But each line, no matter how long, shouldn't be torn. I'll open a rust issue for this. Edit: rust-lang/rust#114140 |
To clarify, I just meant that memory safety is the motivation for having a lock in the first place. Then, yes, given that it has a lock, the standard library gives users a little more control over it, because it's really useful to do so in practice.
What I meant there was that unless it makes special arrangements, a library has no idea if the |
Acquiring a mutex does protect the line from being torn. It doesn't protect what happens before/after that... but if you care about that you could hold the mutex for the whole program duration, then no other thread should be able to print. And AsFd bypasses that guarantee. |
I'm specifically talking about the relationship between libraries and the A library has no idea if the stdout output is even line-oriented. It could be a jpeg, in which case any bytes written by a library could corrupt it, and the mutex isn't a solution to that problem. |
But it is? Main can lock it and then |
Also, "I can acquire |
I think this is a bug. But we should continue the discussion on the new issue. |
I think Internally it would do a dup/dup2 dance to swap out the descriptors and the returned one would have a new value. It doesn't guarantee exclusive access if someone used A use could then be For cross-platform convenience we could also add a |
Proposal
Problem statement/motivation
It is well known that people want more control over stdout and friends (1 2 3 4 5 6 etc etc). Some want to avoid locking, others want to disable line buffer, others still want to change the buffer sizes, and so on.
Solution sketches
The current std* fns are fundamentally incompatible with the flexibility desired above. For example, you cannot both use buffering and not require locks. Thus, I believe the most flexible and simple solution is to simply expose the raw std* Read/Write impls. Users can then:
In the future if we allow overriding the println and co streams, this idea will be complementary since you could build up the desired wrappers around stdout and replace the default stream with the one you just built.
Non-goals: this proposal does not suggest changing the current std* impls. rust-lang/rust#60673 is therefore not addressed by this proposal (but of course you can build your own version and use it in writeln).
Downsides: interaction between raw std* and normal std* may be a little weird. I view this as a non-issue in the same sense that files take an "it's your problem, figure it out" attitude. For example, I can open the same file path twice and it'll be my problem if my writes smash each other. Similarly, it's on the user to decide how they want to handle interleavings (and they might not even care b/c they know their program is single-threaded).
API
Expose the existing
Std{in,out,err}Raw
(and constructor functions) instd::io
.Links and related work
See links above.
The text was updated successfully, but these errors were encountered: