Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise floating point is_finite (2x) and is_infinite (1.6x). #57353

Merged
merged 1 commit into from
Jan 13, 2019

Conversation

huonw
Copy link
Member

@huonw huonw commented Jan 5, 2019

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The abs bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the seta/setae are likely to be collapsed into
conditional jumps or moves (or similar).

The old is_infinite did two comparisons, and the old is_finite did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
6284190
7 years ago.

Benchmark (abs is the new form, std is the old):

test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}

@rust-highfive
Copy link
Collaborator

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 5, 2019
@@ -161,6 +161,11 @@ impl f32 {
self != self
}

#[inline]
fn abs_(self) -> f32 {
f32::from_bits(self.to_bits() & 0x7fff_ffff)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary, rather than using the existing abs method (which uses the LLVM fabsf32 intrinsic directly)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only available in std. #50145

Copy link
Member

@varkor varkor Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm bikeshedding here, but given typical naming conventions, I think it'd make sense to simply call this abs (which shouldn't cause any conflicts) and add a comment explaining why this method is private for discoverability, e.g.

// FIXME(#50145): `abs` is publicly unavailable in libcore due to concerns
// about portability, so this implementation is for private use internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about adding a comment, but I don't think that naming it abs is appropriate. I'd call it something like abs_hack so it's clear that it's not using the proper abs method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about abs_private with a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that.

src/libcore/num/f64.rs Outdated Show resolved Hide resolved
src/libcore/num/f32.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@KodrAus KodrAus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huonw!

I've just left the same nits as @lzutao as suggestions, but r=me anytime.

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
@huonw huonw force-pushed the faster-finiteness-checks branch from 6a4473a to 6e742db Compare January 7, 2019 11:11
@huonw
Copy link
Member Author

huonw commented Jan 7, 2019

Nice catch with the nit. I've updated to fix it.

@KodrAus
Copy link
Contributor

KodrAus commented Jan 7, 2019

@bors r+

@bors
Copy link
Contributor

bors commented Jan 7, 2019

📌 Commit 6e742db has been approved by KodrAus

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 7, 2019
@bors
Copy link
Contributor

bors commented Jan 10, 2019

⌛ Testing commit 6e742db with merge ab00b4b23bee84a75a5ee5ceec8d72340c4795f8...

@bors
Copy link
Contributor

bors commented Jan 10, 2019

💔 Test failed - status-appveyor

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 10, 2019
@pietroalbini
Copy link
Member

@bors retry
AppVeyor... what's wrong with you today?

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 10, 2019
pietroalbini added a commit to pietroalbini/rust that referenced this pull request Jan 12, 2019
…odrAus

Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
Centril added a commit to Centril/rust that referenced this pull request Jan 13, 2019
…odrAus

Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
bors added a commit that referenced this pull request Jan 13, 2019
Rollup of 16 pull requests

Successful merges:

 - #57351 (Don't actually create a full MIR stack frame when not needed)
 - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).)
 - #57412 (Improve the wording)
 - #57436 (save-analysis: use a fallback when access levels couldn't be computed)
 - #57453 (lldb_batchmode.py: try `import _thread` for Python 3)
 - #57454 (Some cleanups for core::fmt)
 - #57461 (Change `String` to `&'static str` in `ParseResult::Failure`.)
 - #57473 (std: Render large exit codes as hex on Windows)
 - #57474 (save-analysis: Get path def from parent in case there's no def for the path itself.)
 - #57494 (Speed up item_bodies for large match statements involving regions)
 - #57496 (re-do docs for core::cmp)
 - #57508 (rustdoc: Allow inlining of reexported crates and crate items)
 - #57547 (Use `ptr::eq` where applicable)
 - #57557 (resolve: Mark extern crate items as used in more cases)
 - #57560 (hygiene: Do not treat `Self` ctor as a local variable)
 - #57564 (Update the const fn tracking issue to the new metabug)

Failed merges:

r? @ghost
@bors bors merged commit 6e742db into rust-lang:master Jan 13, 2019
@huonw huonw deleted the faster-finiteness-checks branch January 15, 2019 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants