-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
70 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
- Feature Name: line_endings | ||
- Start Date: 2015-07-10 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
|
||
Change all functions dealing with reading "lines" to treat both '\n' and '\r\n' | ||
as a valid line-ending. | ||
|
||
# Motivation | ||
|
||
The current behavior of these functions is to treat only '\n' as line-ending. | ||
This is surprising for programmers experienced in other languages. Many | ||
languages open files in a "text-mode" per default, which means when they iterate | ||
over the lines, they don't have to worry about the two kinds of line-endings. | ||
Such programmers will be surprised to learn that they have to take care of such | ||
details themselves in Rust. Some may not even have heard of the distinction | ||
between two styles of line-endings. | ||
|
||
The current design also violates the "do what I mean" principle. Both '\r\n' and | ||
'\n' are widely used as line-separators. By talking about the concept of | ||
"lines", it is clear that the current file (or buffer, really) is considered to | ||
be in text format. It is thus very reasonable to expect "lines" to apply to both | ||
kinds of encoding lines in binary format. | ||
|
||
In particular, if the crate is developed on Linux or Mac, the programmer will | ||
probably have most of his input encoded with only '\n' for the line-endings. He | ||
may use the functions talking about "lines", and they will work all right. It is | ||
only when someone runs this crate on input that contains '\r\n' that the bug | ||
will be uncovered. The editor has personally run into this issue when reading | ||
line-by-line from stdin, with the program suddenly failing on Windows. | ||
|
||
# Detailed design | ||
|
||
The following functions will have to be changed: `BufRead::lines` and | ||
`str::lines`. They both should treat '\r\n' as marking the end of a line. This | ||
can be implemented, for example, by first splitting at '\n' like now and then | ||
removing a trailing '\r' right before returning data to the caller. | ||
|
||
Furthermore, `str::lines_any` (the only function currently dealing with both | ||
kinds of line-endings) is deprecated, as it is then functionally equivalent with | ||
`str::lines`. | ||
|
||
# Drawbacks | ||
|
||
This is a semantics-breaking change, changing the behavior of released, stable | ||
API. However, as argued above, the new behavior is much less surprising than the | ||
old one - so one could consider this fixing a bug in the original | ||
implementation. There are alternatives available for the case that one really | ||
wants to split at '\n' only, namely `BufRead::split` and `str::split`. However, | ||
`BufRead:split` does not iterate over `String`, but rather over `Vec<u8>`, so | ||
users have to insert an additional explicit call to `String::from_utf8`. | ||
|
||
# Alternatives | ||
|
||
There's the obvious alternative of not doing anything. This leaves a gap in the | ||
features Rust provides to deal with text files, making it hard to treat both | ||
kinds of line-endings uniformly. | ||
|
||
The second alternative is to add `BufRead::lines_any` which works similar to | ||
`str::lines_any` in that it deals with both '\n' and '\r\n'. This provides all | ||
the necessary functionality, but it still leaves people with the need to choose | ||
one of the two functions - and potentially choosing the wrong one. In | ||
particular, the functions with the shorter, nicer name (the existing ones) will | ||
almost always *not* be the right choice. | ||
|
||
# Unresolved questions | ||
|
||
None I can think of. |