Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added str::char_offset_iter() and str::rev_char_offset_iter() #8082

Closed
wants to merge 1 commit into from

Conversation

Kimundi
Copy link
Member

@Kimundi Kimundi commented Jul 27, 2013

Also renamed bytes_iter to byte_iter to match other iterators

priv string: &'self str,
}

impl<'self> Iterator<(char, uint)> for StrCharOffsetIterator<'self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to re-implement the StrCharIterator, methods within the same package can read the private fields of that struct (index)

@alexcrichton
Copy link
Member

While we're renaming things, could you also drop the Str prefix on all the iterators? It looks a little silly to import any of them as str::StrXXXIterator.

If you'd rather not do that here though, I'll r+ with the changes.

@bluss
Copy link
Member

bluss commented Jul 27, 2013

This iterator should be bidirectional. So far none of the str iterators are.

@@ -1136,8 +1136,10 @@ pub trait StrSlice<'self> {
fn contains_char(&self, needle: char) -> bool;
fn iter(&self) -> StrCharIterator<'self>;
fn rev_iter(&self) -> StrCharRevIterator<'self>;
fn bytes_iter(&self) -> StrBytesIterator<'self>;
fn bytes_rev_iter(&self) -> StrBytesRevIterator<'self>;
fn byte_iter(&self) -> StrByteIterator<'self>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the same lines as these should be bi-directional, why can't this just be a VecIterator with some various bounds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a vec iterator composed with a dereference from &u8 to u8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point... Although then it seems like there should be:

pub type ByteIterator = TransformIterator<..., ..., VecIterator<..., ...>>;

And then it just returns one of those.

@Kimundi
Copy link
Member Author

Kimundi commented Jul 28, 2013

Whoa, lots of notes

  • The realization that they could be bidirectional came after I already finished this patch. I could change that, but I won't have time for that in the next 2 days, so maybe just r+ for now.
  • Likewise for the Str prefix, I can remove it, but won't have time for it right now.
  • Duplicating the StrCharIterator was so that the method inference doesn't need to be told which of the two possible implementations it has to choose.

If I where to reuse the same struct, this:

s.iter()
s.char_offset_iter()

would turn into this:

s.iter::<char>()
s.iter::<(char, uint)>()

@huonw
Copy link
Member

huonw commented Jul 28, 2013

@Kimundi fwiw, when I was thinking about implementing this, I was going to write the (uint, char) iterator, and convert (what is now called) StrCharIterator to be:

type StrCharIterator<'self> = MapIterator<'self, (uint, char), char, StrCharOffsetIterator<'self>>;

fn iter() -> StrCharIterator<'self> {
    self.char_offset_iter().transform(|(_, c)| c)
}

(Similar to how AnyLineIterator and WordIterator work.)

@bluss
Copy link
Member

bluss commented Jul 28, 2013

Writing the iterators with MapIterator sounds fine, the adaptor will preserve the underlying properties like bidirectionality.

@alexcrichton
Copy link
Member

@Kimundi, I'll open an issue for the Str prefix, and I'd be perfectly happy with @huonw's suggestion

@Kimundi
Copy link
Member Author

Kimundi commented Jul 28, 2013

@alexcrichton: Sounds good, it will have to wait a day more though :)

@Kimundi
Copy link
Member Author

Kimundi commented Jul 29, 2013

I'm running the testsuite right now to catch remaining bugs, but here is what I've done so far:

  • Reduced 6 structs with 6 Iterator implementations (3 distinct Iterators in 2 directions) down to 1 struct with an Iterator and DoubleEndedIterator implemenetation. The remaining 5 are now expressed as typedefs for chained Iterator Adapters.
  • Moved clunky each_split_within out of str. This function is only used in extra::getops, and even there in a wrong way (byte offset are being equaled to character widths)
  • Did a bunch of whitespace and style fixups in str

* Fails during iteration if the string contains a non-whitespace
* sequence longer than the limit.
*/
priv fn each_split_within<'a>(ss: &'a str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note about why this was moved here, and note the fact that it's incorrect wrt multibyte codepoints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if both of these things need to be noted here. It's there because the function above it needs it, and
it's working correctly, the function above just misuses it, which already has an Issue number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ok, I misinterpreted it; this seems fine then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I could add a line like "Was moved here from std::str because this is the only place that uses it, and because it was to specific for a general sting function."

@bluss
Copy link
Member

bluss commented Jul 29, 2013

it's beautiful (my comments were addressed, too)

@Kimundi
Copy link
Member Author

Kimundi commented Jul 29, 2013

Someone on irc raised an interesting point: Just having a CharOffsetIterator seems kinda arbitrary, seeing how there are other information calculated by the Iterator that might be just as useful, like character as utf8 slice, character byte length etc.

How about replacing the CharOffsetIterator by a CharContextIterator that instead of a (uint, char) yields a struct:

struct CharContext {
    chr: char,
    chr_slice: &str
    byte_offset: uint,
    byte_len: uint,
}

Anyway, that would be an future patch.

@bluss
Copy link
Member

bluss commented Jul 29, 2013

An iterator producing only byte_offset and chr_slice: &str would be the fastest, since it doesn't even have to decode the char, only look at the first byte for width. The byte_len info is already included as the length of the chr_slice.

@huonw
Copy link
Member

huonw commented Jul 29, 2013

Everything other than byte_offset is easily computable from other information (e.g. s.slice_from(byte_offset), s.char_offset_iter().enumerate(), len_utf8_bytes(c)), whereas the offset requires having a running sum across iterations. So I'd prefer keeping it as small as possible.

@bluss
Copy link
Member

bluss commented Jul 30, 2013

I'm sorry for the conflicts with pull #8090. Should be mostly MapIterator → Map

Renamed bytes_iter to byte_iter to match other iterators
Refactored str Iterators to use DoubleEnded Iterators and typedefs instead of wrapper structs
Reordered the Iterator section
Whitespace fixup
Moved clunky `each_split_within` function to the one place in the tree where it's actually needed
Replaced all block doccomments in str with line doccomments
bors added a commit that referenced this pull request Jul 30, 2013
Also renamed bytes_iter to byte_iter to match other iterators
@bors bors closed this Jul 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants