Added str::char_offset_iter() and str::rev_char_offset_iter() #8082

Kimundi · 2013-07-27T21:43:47Z

Also renamed bytes_iter to byte_iter to match other iterators

alexcrichton · 2013-07-27T23:52:37Z

src/libstd/str.rs

+    priv string: &'self str,
+}
+
+impl<'self> Iterator<(char, uint)> for StrCharOffsetIterator<'self> {


You don't need to re-implement the StrCharIterator, methods within the same package can read the private fields of that struct (index)

alexcrichton · 2013-07-27T23:53:32Z

While we're renaming things, could you also drop the Str prefix on all the iterators? It looks a little silly to import any of them as str::StrXXXIterator.

If you'd rather not do that here though, I'll r+ with the changes.

bluss · 2013-07-27T23:54:14Z

This iterator should be bidirectional. So far none of the str iterators are.

alexcrichton · 2013-07-27T23:55:47Z

src/libstd/str.rs

@@ -1136,8 +1136,10 @@ pub trait StrSlice<'self> {
    fn contains_char(&self, needle: char) -> bool;
    fn iter(&self) -> StrCharIterator<'self>;
    fn rev_iter(&self) -> StrCharRevIterator<'self>;
-    fn bytes_iter(&self) -> StrBytesIterator<'self>;
-    fn bytes_rev_iter(&self) -> StrBytesRevIterator<'self>;
+    fn byte_iter(&self) -> StrByteIterator<'self>;


Along the same lines as these should be bi-directional, why can't this just be a VecIterator with some various bounds?

It's a vec iterator composed with a dereference from &u8 to u8

Oh good point... Although then it seems like there should be:

pub type ByteIterator = TransformIterator<..., ..., VecIterator<..., ...>>;

And then it just returns one of those.

Kimundi · 2013-07-28T10:56:52Z

Whoa, lots of notes

The realization that they could be bidirectional came after I already finished this patch. I could change that, but I won't have time for that in the next 2 days, so maybe just r+ for now.
Likewise for the Str prefix, I can remove it, but won't have time for it right now.
Duplicating the StrCharIterator was so that the method inference doesn't need to be told which of the two possible implementations it has to choose.

If I where to reuse the same struct, this:

s.iter()
s.char_offset_iter()

would turn into this:

s.iter::<char>()
s.iter::<(char, uint)>()

huonw · 2013-07-28T11:51:08Z

@Kimundi fwiw, when I was thinking about implementing this, I was going to write the (uint, char) iterator, and convert (what is now called) StrCharIterator to be:

type StrCharIterator<'self> = MapIterator<'self, (uint, char), char, StrCharOffsetIterator<'self>>;

fn iter() -> StrCharIterator<'self> {
    self.char_offset_iter().transform(|(_, c)| c)
}

(Similar to how AnyLineIterator and WordIterator work.)

bluss · 2013-07-28T12:23:56Z

Writing the iterators with MapIterator sounds fine, the adaptor will preserve the underlying properties like bidirectionality.

alexcrichton · 2013-07-28T18:08:06Z

@Kimundi, I'll open an issue for the Str prefix, and I'd be perfectly happy with @huonw's suggestion

Kimundi · 2013-07-28T18:21:40Z

@alexcrichton: Sounds good, it will have to wait a day more though :)

Kimundi · 2013-07-29T12:37:54Z

I'm running the testsuite right now to catch remaining bugs, but here is what I've done so far:

Reduced 6 structs with 6 Iterator implementations (3 distinct Iterators in 2 directions) down to 1 struct with an Iterator and DoubleEndedIterator implemenetation. The remaining 5 are now expressed as typedefs for chained Iterator Adapters.
Moved clunky each_split_within out of str. This function is only used in extra::getops, and even there in a wrong way (byte offset are being equaled to character widths)
Did a bunch of whitespace and style fixups in str

huonw · 2013-07-29T12:48:40Z

src/libextra/getopts.rs

+     *  Fails during iteration if the string contains a non-whitespace
+     *  sequence longer than the limit.
+     */
+    priv fn each_split_within<'a>(ss: &'a str,


Could you add a note about why this was moved here, and note the fact that it's incorrect wrt multibyte codepoints.

I'm not sure if both of these things need to be noted here. It's there because the function above it needs it, and
it's working correctly, the function above just misuses it, which already has an Issue number.

Oh, ok, I misinterpreted it; this seems fine then.

That said, I could add a line like "Was moved here from std::str because this is the only place that uses it, and because it was to specific for a general sting function."

bluss · 2013-07-29T17:26:13Z

it's beautiful (my comments were addressed, too)

Kimundi · 2013-07-29T19:41:29Z

Someone on irc raised an interesting point: Just having a CharOffsetIterator seems kinda arbitrary, seeing how there are other information calculated by the Iterator that might be just as useful, like character as utf8 slice, character byte length etc.

How about replacing the CharOffsetIterator by a CharContextIterator that instead of a (uint, char) yields a struct:

struct CharContext {
    chr: char,
    chr_slice: &str
    byte_offset: uint,
    byte_len: uint,
}

Anyway, that would be an future patch.

bluss · 2013-07-29T20:15:35Z

An iterator producing only byte_offset and chr_slice: &str would be the fastest, since it doesn't even have to decode the char, only look at the first byte for width. The byte_len info is already included as the length of the chr_slice.

huonw · 2013-07-29T23:13:53Z

Everything other than byte_offset is easily computable from other information (e.g. s.slice_from(byte_offset), s.char_offset_iter().enumerate(), len_utf8_bytes(c)), whereas the offset requires having a running sum across iterations. So I'd prefer keeping it as small as possible.

bluss · 2013-07-30T00:36:13Z

I'm sorry for the conflicts with pull #8090. Should be mostly MapIterator → Map

Renamed bytes_iter to byte_iter to match other iterators Refactored str Iterators to use DoubleEnded Iterators and typedefs instead of wrapper structs Reordered the Iterator section Whitespace fixup Moved clunky `each_split_within` function to the one place in the tree where it's actually needed Replaced all block doccomments in str with line doccomments

Also renamed bytes_iter to byte_iter to match other iterators

alexcrichton reviewed Jul 27, 2013
View reviewed changes

huonw reviewed Jul 29, 2013
View reviewed changes

bors added a commit that referenced this pull request Jul 30, 2013

auto merge of #8082 : Kimundi/rust/master, r=huonw

7fc8c14

Also renamed bytes_iter to byte_iter to match other iterators

bors closed this Jul 30, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added str::char_offset_iter() and str::rev_char_offset_iter() #8082

Added str::char_offset_iter() and str::rev_char_offset_iter() #8082

Kimundi commented Jul 27, 2013

alexcrichton Jul 27, 2013

alexcrichton commented Jul 27, 2013

bluss commented Jul 27, 2013

alexcrichton Jul 27, 2013

bluss Jul 27, 2013

alexcrichton Jul 27, 2013

Kimundi commented Jul 28, 2013

huonw commented Jul 28, 2013

bluss commented Jul 28, 2013

alexcrichton commented Jul 28, 2013

Kimundi commented Jul 28, 2013

Kimundi commented Jul 29, 2013

huonw Jul 29, 2013

Kimundi Jul 29, 2013

huonw Jul 29, 2013

Kimundi Jul 29, 2013

bluss commented Jul 29, 2013

Kimundi commented Jul 29, 2013

bluss commented Jul 29, 2013

huonw commented Jul 29, 2013

bluss commented Jul 30, 2013

Added str::char_offset_iter() and str::rev_char_offset_iter() #8082

Added str::char_offset_iter() and str::rev_char_offset_iter() #8082

Conversation

Kimundi commented Jul 27, 2013

Choose a reason for hiding this comment

alexcrichton commented Jul 27, 2013

bluss commented Jul 27, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kimundi commented Jul 28, 2013

huonw commented Jul 28, 2013

bluss commented Jul 28, 2013

alexcrichton commented Jul 28, 2013

Kimundi commented Jul 28, 2013

Kimundi commented Jul 29, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluss commented Jul 29, 2013

Kimundi commented Jul 29, 2013

bluss commented Jul 29, 2013

huonw commented Jul 29, 2013

bluss commented Jul 30, 2013