Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Add a new env module #21787

Merged
merged 1 commit into from
Feb 3, 2015
Merged

std: Add a new env module #21787

merged 1 commit into from
Feb 3, 2015

Conversation

alexcrichton
Copy link
Member

This is an implementation of RFC 578 which adds a new std::env module
to replace most of the functionality in the current std::os module. More
details can be found in the RFC itself, but as a summary the following methods
have all been deprecated:

  • os::args_as_bytes => env::args
  • os::args => env::args
  • os::consts => env::consts
  • os::dll_filename => no replacement, use env::consts directly
  • os::page_size => env::page_size
  • os::make_absolute => use env::current_dir + join instead
  • os::getcwd => env::current_dir
  • os::change_dir => env::set_current_dir
  • os::homedir => env::home_dir
  • os::tmpdir => env::temp_dir
  • os::join_paths => env::join_paths
  • os::split_paths => env::split_paths
  • os::self_exe_name => env::current_exe
  • os::self_exe_path => use env::current_exe + pop
  • os::set_exit_status => env::set_exit_status
  • os::get_exit_status => env::get_exit_status
  • os::env => env::vars
  • os::env_as_bytes => env::vars
  • os::getenv => env::var or env::var_string
  • os::getenv_as_bytes => env::var
  • os::setenv => env::set_var
  • os::unsetenv => env::remove_var

Many function signatures have also been tweaked for various purposes, but the
main changes were:

  • Vec-returning APIs now all return iterators instead
  • All APIs are now centered around OsString instead of Vec<u8> or String.
    There is currently on convenience API, env::var_string, which can be used to
    get the value of an environment variable as a unicode String.

All old APIs are #[deprecated] in-place and will remain for some time to allow
for migrations. The semantics of the APIs have been tweaked slightly with regard
to dealing with invalid unicode (panic instead of replacement).

The new std::env module is all contained within the env feature, so crates
must add the following to access the new APIs:

#![feature(env)]

[breaking-change]

@rust-highfive
Copy link
Collaborator

r? @pcwalton

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @aturon

note that rust-lang/rfcs#578 has not yet been merged and this should hold off on approval until that time.

@rust-highfive rust-highfive assigned aturon and unassigned pcwalton Jan 30, 2015
@alexcrichton alexcrichton force-pushed the std-env branch 25 times, most recently from 9908425 to a9452d6 Compare January 30, 2015 21:13
@bors
Copy link
Contributor

bors commented Feb 1, 2015

⌛ Testing commit b05dbc6 with merge 48a923d...

@bors
Copy link
Contributor

bors commented Feb 1, 2015

💔 Test failed - auto-mac-64-nopt-t

@alexcrichton
Copy link
Member Author

@bors: r=aturon 630df23

@bors
Copy link
Contributor

bors commented Feb 1, 2015

⌛ Testing commit 630df23 with merge a097829...

@bors
Copy link
Contributor

bors commented Feb 1, 2015

💔 Test failed - auto-win-64-nopt-t

This is an implementation of [RFC 578][rfc] which adds a new `std::env` module
to replace most of the functionality in the current `std::os` module. More
details can be found in the RFC itself, but as a summary the following methods
have all been deprecated:

[rfc]: rust-lang/rfcs#578

* `os::args_as_bytes`   => `env::args`
* `os::args`            => `env::args`
* `os::consts`          => `env::consts`
* `os::dll_filename`    => no replacement, use `env::consts` directly
* `os::page_size`       => `env::page_size`
* `os::make_absolute`   => use `env::current_dir` + `join` instead
* `os::getcwd`          => `env::current_dir`
* `os::change_dir`      => `env::set_current_dir`
* `os::homedir`         => `env::home_dir`
* `os::tmpdir`          => `env::temp_dir`
* `os::join_paths`      => `env::join_paths`
* `os::split_paths`     => `env::split_paths`
* `os::self_exe_name`   => `env::current_exe`
* `os::self_exe_path`   => use `env::current_exe` + `pop`
* `os::set_exit_status` => `env::set_exit_status`
* `os::get_exit_status` => `env::get_exit_status`
* `os::env`             => `env::vars`
* `os::env_as_bytes`    => `env::vars`
* `os::getenv`          => `env::var` or `env::var_string`
* `os::getenv_as_bytes` => `env::var`
* `os::setenv`          => `env::set_var`
* `os::unsetenv`        => `env::remove_var`

Many function signatures have also been tweaked for various purposes, but the
main changes were:

* `Vec`-returning APIs now all return iterators instead
* All APIs are now centered around `OsString` instead of `Vec<u8>` or `String`.
  There is currently on convenience API, `env::var_string`, which can be used to
  get the value of an environment variable as a unicode `String`.

All old APIs are `#[deprecated]` in-place and will remain for some time to allow
for migrations. The semantics of the APIs have been tweaked slightly with regard
to dealing with invalid unicode (panic instead of replacement).

The new `std::env` module is all contained within the `env` feature, so crates
must add the following to access the new APIs:

    #![feature(env)]

[breaking-change]
@alexcrichton
Copy link
Member Author

@bors: r+ 70ed3a4

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 2, 2015
Conflicts:
	src/libstd/sys/unix/backtrace.rs
	src/libstd/sys/unix/os.rs
@bors bors merged commit 70ed3a4 into rust-lang:master Feb 3, 2015
@alexcrichton alexcrichton deleted the std-env branch February 3, 2015 06:44
semarie added a commit to semarie/rust that referenced this pull request Feb 5, 2015
- add `_SC_GETPW_R_SIZE_MAX` constant
- declare `struct passwd`
- convert `load_self` to `current_exe`

Note: OpenBSD don't provide system function to return a valuable Path
for `env::current_exe`. The implementation is currently based on the
value of `argv[0]`, which couldn't be used when executable is called via
PATH.
@John-Nagle
Copy link

This is great! With this, to get the command line arguments as a vector of strings, you simply write

let args: Vec<String> = env::args().map(|s| s.into_string().unwrap()).collect();

Now, new programmers learning Rust will have to understand closures, iterators, functional programming, and UTF8 vs OS strings just to get past "Hello World". That will keep out the Code Academy riff-raff!

Put in a convenience function in env::args for this.

For a sense of the pain caused by this, see this Stack Overflow question:

http://stackoverflow.com/questions/15619320/how-to-access-command-line-parameters-in-rust/28356205#28356205

@nagisa
Copy link
Member

nagisa commented Feb 6, 2015

@John-Nagle your example panics if any of the arguments contain non-unicode data which is not that uncommon. Let’s not allow sloppy programming practices proliferate.

@John-Nagle
Copy link

I know. "pub fn into_string_lossy(Self) -> String;" for OsString doesn't seem to have been implemented yet.

If you have non-Unicode data from OsString conversion of command line args, something is badly broken.

@nagisa
Copy link
Member

nagisa commented Feb 6, 2015

If you have non-Unicode data from OsString conversion of command line args, something is badly broken.

Are you implying any system which uses non utf-8 locale is broken?

@aturon
Copy link
Member

aturon commented Feb 6, 2015

@John-Nagle

I'm sympathetic to your original comment; the prominence of OsString was debated as part of this API, and what landed in this PR was a minimalistic, initial cut.

An alternative API that pushes stronger for Unicode-by-default might be:

pub fn args() -> Args // Iterator<Item = String>
pub fn args_os() -> ArgsOs // Iterator<Item = OsString>

pub fn var<K: ?Sized>(key: &K) -> Result<String, VarError> where K: AsOsStr
pub fn var_os<K: ?Sized>(key: &K) -> Option<OsString> where K: AsOsStr

pub fn vars() -> Vars // Iterator<Item = (String, String)>
pub fn vars_os() -> VarsOs // Iterator<Item = (OsString, OsString)>

where the non-OS versions will panic on non-Unicode data, just as your example does. We're likely to grow such variants regardless; the main question is what the un-decorated names like args should map to.

While I'm also sympathetic to worries about how much newcomers have to learn at once, I don't particularly buy the point about Vec versus iterators. For one, iterators are an extremely common concept found in many (these days, most) languages. It's a concept you need to know to understand for in Rust. For another, if you're thinking about very simple scripts that take e.g. a single argument, then args().nth(1) will do the trick (and give you an Option value that you can use to display an error for example).

I know. "pub fn into_string_lossy(Self) -> String;" for OsString doesn't seem to have been implemented yet.

The to_string_lossy method is probably what you want (http://static.rust-lang.org/doc/master/std/ffi/struct.OsStr.html#method.to_string_lossy), but discoverability is a bit poor right now because rustdoc doesn't yet hint about Deref.

@John-Nagle
Copy link

Are you implying any system which uses non utf-8 locale is broken?

No, that's handled. OsString is an opaque type. See
https://github.com/rust-lang/rfcs/blob/master/text/0517-io-os-reform.md#the-design-os_str
especially where it says "but CRUCIALLY NOT as_bytes". If OsString values are locale sensitive, the OS support package for that platform must deal with it invisibly to the Rust program.

This whole area seems way overdesigned. The old approach was much simpler, and the use case for all this complexity seems to be corrupted UCS-2 on Windows. Anything Windows can represent as UCS-2, even with surrogate pairs, can be represented as a UTF-8 Rust string. Some conversions in the other direction won't work, and they should raise exceptions. Yes, this may mean that you can't put emoji into a Windows file name.

@aturon
Copy link
Member

aturon commented Feb 6, 2015

This whole area seems way overdesigned. The old approach was much simpler, and the use case for all this complexity seems to be corrupted UCS-2 on Windows.

The other side of OsString, on Unix, is dealing with byte sequences that are not UTF-8. Since paths, arguments, environment variables, etc are just C strings (and in many systems are not required to follow any particular encoding), we need a way to represent the full space of values.

Note that "the old approach" already had things like env_as_bytes to tackle the above. This has just been packaged into a more robust, cross-platform OsString.

That said, as I wrote above, I think we probably want to consider making the "default" variants work with String and have a separate set of more "advanced" forms that robust, cross-platform code would use.

@nagisa
Copy link
Member

nagisa commented Feb 6, 2015

@John-Nagle yes, this is handled by OsString, but conversion of the data inside OsString into String will fail with panic in many cases.

Which is exactly why I argue against any methods converting to String implicitly.

@aturon
Copy link
Member

aturon commented Feb 6, 2015

@John-Nagle

Anything Windows can represent as UCS-2, even with surrogate pairs, can be represented as a UTF-8 Rust string. Some conversions in the other direction won't work, and they should raise exceptions.

This appears to be backwards. UCS-2 is a superset of UTF-16. That means that e.g. Rust strings can be converted to UCS-2 without loss, but not vice versa.

Regardless, the motivation for OsString is to provide some means of faithfully representing the platform's "native" string type (usually UCS-2 on Windows, byte sequences on Unixen) without loss and in a way that supports cross-platform programming.

@John-Nagle
Copy link

That said, as I wrote above, I think we probably want to consider making the "default" variants work with String and have a separate set of more "advanced" forms that robust, cross-platform code would use.

That seems to be the most practical approach. New Rust programmers should not have to comprehend all the topics covered in this discussion just to access the command line arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants