Simplify CharIndices.next #61070

jridgewell · 2019-05-23T04:06:43Z

Char.len_utf8 is stable since #49698, making this a little easier to follow.

Char.len_utf8 is stable since rust-lang#49698, making this a little easier to follow.

rust-highfive · 2019-05-23T04:06:47Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @rkruppe (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

Centril · 2019-05-23T07:36:13Z

@bors rollup

Centril · 2019-05-23T07:44:27Z

cc @SimonSapin

hanna-kruppe · 2019-05-23T13:30:55Z

Thanks, LGTM!

@bors r+

bors · 2019-05-23T13:30:57Z

📌 Commit 7af83dc has been approved by rkruppe

SimonSapin · 2019-05-23T17:15:49Z

char::len_utf8 is implemented with three branches, which seems potentially more costly than the substraction that it replaces. Have you checked that this optimizes well? It’s possible that the optimizer realizes that this computation is redundant with next_code_point, but it’s not obvious.

hanna-kruppe · 2019-05-23T17:20:05Z

@bors r- (let's wait with merging until this performance question is cleared up)

SimonSapin · 2019-05-23T17:27:29Z

I don’t read assembly fluently enough to conclude anything from this:

Before: https://rust.godbolt.org/z/_3EOcq
After: https://rust.godbolt.org/z/ILP_Tc

jridgewell · 2019-05-24T02:55:32Z

I created a sample benchmark to compare the two implementations:

benchmark

#![feature(test)]
extern crate test;
use std::str::Chars;

pub struct CharIndices<'a> {
    front_offset: usize,
    iter: Chars<'a>,
}

pub fn before(self_: &mut CharIndices) -> Option<(usize, char)> {
    let pre_len = self_.iter.as_str().len();
    match self_.iter.next() {
        None => None,
        Some(ch) => {
            let index = self_.front_offset;                
            let len = self_.iter.as_str().len();
            self_.front_offset += pre_len - len;
            Some((index, ch))
        }
    }
}

pub fn after(self_: &mut CharIndices) -> Option<(usize, char)> {
    let ch = self_.iter.next()?;
    let index = self_.front_offset;
    self_.front_offset += ch.len_utf8();
    Some((index, ch))
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[bench]
    fn before(b: &mut Bencher) {
        let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
        let len = s.chars().count();

        b.iter(|| {
            let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
            let mut i = 0;

            while let Some(_) = super::before(&mut chars) {
                i += 1;
            }
            assert_eq!(i, len);
        });
    }

    #[bench]
    fn after(b: &mut Bencher) {
        let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
        let len = s.chars().count();

        b.iter(|| {
            let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
            let mut i = 0;

            while let Some(_) = super::after(&mut chars) {
                i += 1;
            }
            assert_eq!(i, len);
        });
    }
}

    Finished release [optimized] target(s) in 0.00s
     Running target/release/deps/tmp-188dab753e97d367

running 2 tests
test tests::after  ... bench:          44 ns/iter (+/- 3)
test tests::before ... bench:          44 ns/iter (+/- 3)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

It appears they're roughly the same speed? It might be that I'm just doing something wrong, the assembly definitely looks more complex the the after output above.

Before: https://rust.godbolt.org/z/sIJTXu
After: https://rust.godbolt.org/z/7bNVDu

jridgewell · 2019-05-24T04:37:27Z

Nevermind, when I updated the benchmark to actually use the index, it's slower:

benchmark using index

#![feature(test)]
extern crate test;
use std::str::Chars;

pub struct CharIndices<'a> {
    front_offset: usize,
    iter: Chars<'a>,
}

pub fn before(self_: &mut CharIndices) -> Option<(usize, char)> {
    let pre_len = self_.iter.as_str().len();
    match self_.iter.next() {
        None => None,
        Some(ch) => {
            let index = self_.front_offset;                
            let len = self_.iter.as_str().len();
            self_.front_offset += pre_len - len;
            Some((index, ch))
        }
    }
}

pub fn after(self_: &mut CharIndices) -> Option<(usize, char)> {
    let ch = self_.iter.next()?;
    let index = self_.front_offset;
    self_.front_offset += ch.len_utf8();
    Some((index, ch))
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[bench]
    fn before(b: &mut Bencher) {
        let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
        let len = s.len();

        b.iter(|| {
            let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
            let mut i = 0;

            while let Some((index, _)) = super::before(&mut chars) {
                i = index;
            }
            assert_eq!(i + 1, len);
        });
    }

    #[bench]
    fn after(b: &mut Bencher) {
        let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
        let len = s.len();

        b.iter(|| {
            let mut chars = CharIndices { front_offset: 0, iter: s.chars() };
            let mut i = 0;

            while let Some((index, _)) = super::after(&mut chars) {
                i = index;
            }
            assert_eq!(i + 1, len);
        });
    }
}

   Compiling tmp v0.1.0 (/Users/jridgewell/tmp)
    Finished release [optimized] target(s) in 0.45s
     Running target/release/deps/tmp-188dab753e97d367

running 2 tests
test tests::after  ... bench:          67 ns/iter (+/- 8)
test tests::before ... bench:          55 ns/iter (+/- 2)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

Simplify CharIndices.next

7af83dc

Char.len_utf8 is stable since rust-lang#49698, making this a little easier to follow.

rust-highfive assigned hanna-kruppe May 23, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 23, 2019

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 23, 2019

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 23, 2019

jridgewell closed this May 27, 2019

jridgewell deleted the charindices-next branch March 3, 2020 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify CharIndices.next #61070

Simplify CharIndices.next #61070

jridgewell commented May 23, 2019

rust-highfive commented May 23, 2019

Centril commented May 23, 2019

Centril commented May 23, 2019

hanna-kruppe commented May 23, 2019

bors commented May 23, 2019

SimonSapin commented May 23, 2019

hanna-kruppe commented May 23, 2019

SimonSapin commented May 23, 2019

jridgewell commented May 24, 2019 •

edited

Loading

jridgewell commented May 24, 2019 •

edited

Loading

Simplify CharIndices.next #61070

Simplify CharIndices.next #61070

Conversation

jridgewell commented May 23, 2019

rust-highfive commented May 23, 2019

Centril commented May 23, 2019

Centril commented May 23, 2019

hanna-kruppe commented May 23, 2019

bors commented May 23, 2019

SimonSapin commented May 23, 2019

hanna-kruppe commented May 23, 2019

SimonSapin commented May 23, 2019

jridgewell commented May 24, 2019 • edited Loading

jridgewell commented May 24, 2019 • edited Loading

jridgewell commented May 24, 2019 •

edited

Loading

jridgewell commented May 24, 2019 •

edited

Loading