Skip to content

Commit

Permalink
Auto merge of #50398 - llogiq:memchr-nano-opt, r=nagisa
Browse files Browse the repository at this point in the history
nano-optimization for memchr::repeat_byte

This replaces the multiple shifts & bitwise or with a single multiplication

In my benchmarks this performs equally well or better, especially on 64bit systems (it shaves a stable nanosecond on my skylake). This may go against conventional wisdom, but the shifts and bitwise ors cannot be pipelined because of hard data dependencies.

While it may or may not be worthwile from an optimization standpoint, it also reduces code size, so there's basically no downside.
  • Loading branch information
bors committed May 4, 2018
2 parents 841e0cc + 1cefb5c commit e78c51a
Showing 1 changed file with 2 additions and 13 deletions.
15 changes: 2 additions & 13 deletions src/libcore/slice/memchr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,21 +39,10 @@ fn repeat_byte(b: u8) -> usize {
(b as usize) << 8 | b as usize
}

#[cfg(target_pointer_width = "32")]
#[cfg(not(target_pointer_width = "16"))]
#[inline]
fn repeat_byte(b: u8) -> usize {
let mut rep = (b as usize) << 8 | b as usize;
rep = rep << 16 | rep;
rep
}

#[cfg(target_pointer_width = "64")]
#[inline]
fn repeat_byte(b: u8) -> usize {
let mut rep = (b as usize) << 8 | b as usize;
rep = rep << 16 | rep;
rep = rep << 32 | rep;
rep
(b as usize) * (::usize::MAX / 255)

This comment has been minimized.

Copy link
@therealprof

therealprof May 5, 2018

Contributor

I'm really curious whether the compiler is smart enough to never ever execute this division as this would really suck for e.g. MCUs without hardware divisor.

}

/// Return the first index matching the byte `x` in `text`.
Expand Down

0 comments on commit e78c51a

Please sign in to comment.