Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Path rewrite #9655

Merged
merged 30 commits into from
Oct 16, 2013
Merged

Complete Path rewrite #9655

merged 30 commits into from
Oct 16, 2013

Conversation

lilyball
Copy link
Contributor

@lilyball lilyball commented Oct 1, 2013

Rewrite the entire std::path module from scratch.

PosixPath is now based on ~[u8], which fixes #7225.
Unnecessary allocation has been eliminated.

There are a lot of clients of Path that still assume utf-8 paths.
This is covered in #9639.



impl ToStr for WindowsPath {
fn to_str(&self) -> ~str {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that these were removed, but then it also seems like you're calling as_str().unwrap().to_str() in a very large number of locations. Is there a reason that you didn't implement to_str() on paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton Nearly all of those locations have a // FIXME to make it so we don't need to do that. Almost all of those will hopefully become .as_vec() as other parts of the system become aware of non-utf8 paths. I intentionally left it awkward to convert to a ~str because I want to discourage unnecessary allocation.

Besides, I can't impl ToStr. At best I could have a .to_str() -> Option<~str> but that's not great.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related somehow to

fn run_test(file: ~str) {
let path = os::make_absolute(&Path::new(file));
// FIXME (#1094): not the right way to transform a path

(contenttest.rs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let infile = ~"file://" + path.display().to_str();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owaspjocur I'm not sure what you mean. Where did that code come from? I don't see it in the rust source tree.

Taking a string/byte vector that represents a path and calling os::make_absolute(&Path::new(file)) is a perfectly legitimate way of getting an absolute path back.

As for your second line of code, that's absolutely incorrect. .display() is purely for displaying in some sort of UI. It should never be used to construct some other value in code that's used for anything other than showing to the user. If you want to construct a URL out of it, you'll need some way to add the correct percent-escapes for all invalid byte sequences in the path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard My excuses, this code is from Servo project. Even so... does that makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owaspjocur The first code snippet seems like a perfectly fine way of making an absolute path.

The second code snippet is completely wrong.

@huonw
Copy link
Member

huonw commented Oct 1, 2013

Ok, I did a brief scan, but it was not a full or thorough review; in particular, I basically skipped the entirety of rustpkg, and didn't look at path/windows.rs at all since Github isn't showing it inline. (And the rest of it wasn't particularly detailed either, definitely needs more eyes.)

@klutzy
Copy link
Contributor

klutzy commented Oct 1, 2013

I can't leave inline comment on diff :( here are rough questions on just skimming path/windows.rs:

  • On Win32 it currently assumes any ~[u8] inputs are utf-8 rather than arbitrary bytes, right? I'm totally ok with it, but I just want to be sure that there is more stuff to do for edge cases. :)
  • Seems that is_sep and is_sep2 have different use cases. Is it right that its main concern is "some/relative/path" is ok but "//?/prefixed/absolute/path" is not?
  • Similarly, fn prefix_allows_slash seems to indicate whether we should use is_sep or is_sep2, but prefix_is_verbatim also does same role at push_str_unchecked { fn push_str_unchecked }. Is it intended?
  • On "though does Windows use UTF-16 or UCS-2?" comment: I thought windows uses utf16 (or ucs2 if < xp) if it is without "?" but it just uses raw ~[u16] if it is with "?" since it bypasses some string parsing and sends it directly to fs driver

@alexcrichton
Copy link
Member

I believe that this will also close #5389, would you mind putting the issue in one of the commit messages?

@lilyball
Copy link
Contributor Author

lilyball commented Oct 1, 2013

@huonw: Thanks for the brief scan. I replied to your comments, and I'm fixing all of the ones that have a clear resolution (i.e. I'm not trying to fix verbatim paths in glob because there's no obvious solution besides ignoring them).

@alexcrichton: Will do

@klutzy:

  • Yes, ~[u8] inputs must be utf-8. It raises str::not_utf8 if they're not (as it passes them through str::from_utf8 and friends).
  • Yes, is_sep tests specifically for \, and is_sep2 allows for both \ and /. The former is used for verbatim paths (\\?\foo\bar) and the latter for mostly everything else.
  • No, that's a good point. prefix_allows_slash is equivalent to !prefix_is_verbatim. I'll fix that.
  • \\?\ bypasses string parsing, yes, but the windows APIs are still going to be taking WCHARs, which are 16-bit values, so they still can't take arbitrary byte sequences.

#[cfg(not(target_os = "win32"))]
impl Path {
/// Calls stat() on the represented file and returns the resulting libc::stat
pub fn stat(&self) -> Option<libc::stat> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new runtime in place and the ability to stat paths, this function should probably delegate to rt::io::file instead of using the blocking implementation of libc::stat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton I'm inclined to say that stat doesn't belong in the path module at all. Instead of calling methods on paths, paths should be arguments to methods. I didn't want to deal with that during this rewrite though, which is why I just ported over the stat methods as-is.

@lilyball
Copy link
Contributor Author

lilyball commented Oct 3, 2013

I believe I've addressed all the concrete suggestions at this point (and replied to the ones I didn't address).

} else {
path::posix::is_sep(c)
c <= '\x7F' && path::posix::is_sep(&(c as u8))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this appears twice would it be better & safer to make path::posix::is_sep take char with an additional is_sep_byte (or keep is_sep as is and have a is_sep_char)?

Once that happens, you could use use is_sep = path::windows::is_sep2 and use is_sep = path::posix::is_sep_char with the appropriate cfg on each, rather than defining wrappers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really very happy that is_sep is public to begin with to be honest. It's really just a helper method, but since PosixPath is based on ~[u8] posix::is_sep needs to take &u8, and since WindowsPath is based on Str windows::is_sep/windows::is_sep2 needs to take char. They're mostly just meant for the ability to pass to a split iterator of the appropriate variant.

I could have a posix::is_sep_char, but it feels weird making that just so libextra::glob can avoid doing extra work. Doesn't really feel like a win.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @huonw here, whether a character is a separator is something that should be implemented in the path module and nowhere else. It may be an implementation detail that it just shells out to a lower is_sep(foo: u8) on unix, but I think that the path module should still have an is_sep function and this one should be entirely removed.

@emberian
Copy link
Member

emberian commented Oct 3, 2013

Huge +1 from me

// XXX: This is no longer going to ever emit "up" components, because Path
// already normalizes the string.
fn clean_srcpath(src: &[u8], f: &fn(&str)) {
let p = Path::from_vec(src);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton clean_srcpath no longer does anything particularly useful with the new Path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still does some useful things, even if the Path is normalized, you still don't know the root so it could contain .. still, right? I would imagine that . as a component probably doesn't show up though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's true, a path ../../foo would still yield "up" components. You would get . if that's all the path contains (there's no such thing as an empty path; they turn into .). I'll go ahead and just modify this to ignore ..

These functions are for working with a string representation of the path
even if it's not UTF-8 encoded. They replace invalid UTF-8 sequences
with the replacement char.
Turns out you can't call static methods on typedefs. Use `pub use`
instead to work around this issue.
These methods return an object that can be formatted using {} to print
display strings.

Path itself does not implement fmt::Default to avoid accidental usage of
display strings in incorrect places (e.g. process arguments).
Remove the old path.
Rename path2 to path.
Update all clients for the new path.

Also make some miscellaneous changes to the Path APIs to help the
adoption process.
There are no clients of this API, so just remove it.

Update the module docstring to mention normalization.
Add a new trait BytesContainer that is implemented for both byte vectors
and strings.

Convert Path::from_vec and ::from_str to one function, Path::new().

Remove all the _str-suffixed mutation methods (push, join, with_*,
set_*) and modify the non-suffixed versions to use BytesContainer.
Rewrite these methods as methods on Display and FilenameDisplay. This
turns

  do path.with_display_str |s| { ... }

into

  do path.display().with_str |s| { ... }
@alexcrichton
Copy link
Member

This has an incredibly high chance of bitrotting, and everyone's in agreement that Path needs to move in this direction. It's not 100% certain that this is the "final API" which will be exposed from this module, but I believe that this is to the point where it's more beneficial to land this then to block it. If there are more opportunities for improvement, there can always be follow-up pull requests. I imagine that any follow-ups will have a much easier time landing than this one.

Standardize the is_sep() functions to be the same in both posix and
windows, and re-export from path. Update extra::glob to use this.

Remove the usage of either, as it's going away.

Move the WindowsPath-specific methods out of WindowsPath and make them
top-level functions of path::windows instead. This way you cannot
accidentally write code that will fail to compile on non-windows
architectures without typing ::windows anywhere.

Remove GenericPath::from_c_str() and just impl BytesContainer for
CString instead.

Remove .join_path() and .push_path() and just implement BytesContainer
for Path instead.

Remove FilenameDisplay and add a boolean flag to Display instead.

Remove .each_parent(). It only had one caller, so just inline its
definition there.
Delete the following API functions:
- set_dirname()
- with_dirname()
- set_filestem()
- with_filestem()
- add_extension()
- file_path()

Also change pop() to return a boolean instead of an owned copy of the
old filename.
Also fix some issues that crept into earlier commits during the conflict
resoution for the rebase.
Remove redundant `contains_nul` definition.

Make `parse_prefix` private.
bors added a commit that referenced this pull request Oct 16, 2013
Rewrite the entire `std::path` module from scratch.

`PosixPath` is now based on `~[u8]`, which fixes #7225.
Unnecessary allocation has been eliminated.

There are a lot of clients of `Path` that still assume utf-8 paths.
This is covered in #9639.
@bors bors closed this Oct 16, 2013
@bors bors merged commit d108a22 into rust-lang:master Oct 16, 2013
@lilyball lilyball deleted the path-rewrite branch October 16, 2013 19:31
flip1995 pushed a commit to flip1995/rust that referenced this pull request Oct 20, 2022
fix `box-default` linting `no_std` non-boxes

This fixes rust-lang#9653 by doing the check against the `Box` type correctly even if `Box` isn't there, as in `no_std` code. Thanks to `@lukas-code` for opening the issue and supplying a reproducer!

---

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Paths and filenames are not necessarily UTF-8
7 participants