-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollup of 15 pull requests #57554
Rollup of 15 pull requests #57554
Conversation
Previously calculating glob map was *opt-in*, however it did record node id -> ident use for every use directive. This aims to see if we can unconditionally calculate the glob map and not regress performance.
manual impl was a workaround for rust-lang#28229.
These can both rely on IEEE754 semantics to be made faster, by folding away the sign with an abs (left private for now), and then comparing to infinity, letting the NaN semantics of a direct float comparison handle NaN input properly. The `abs` bit-fiddling is simple (a single and), and so these new forms compile down to a few instructions, without branches, e.g. for f32: ```asm is_infinite: andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000 setae al ret is_finite: andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000 ucomiss xmm1, xmm0 seta al ret ``` When used in loops/repeatedly, they get even better: the memory operations (loading the mask 0x7FFFFFFF for abs, and infinity 0x7F80_0000) are likely to be hoisted out of the individual calls, to be shared, and the `seta`/`setae` are likely to be collapsed into conditional jumps or moves (or similar). The old `is_infinite` did two comparisons, and the old `is_finite` did three (with a branch), and both of them had to check the flags after every one of those comparison. These functions have had that old implementation since they were added in rust-lang@6284190 7 years ago. Benchmark (`abs` is the new form, `std` is the old): ``` test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10) test f32_is_finite_std ... bench: 118 ns/iter (+/- 5) test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1) test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6) test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12) test f64_is_finite_std ... bench: 128 ns/iter (+/- 25) test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5) test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23) ``` ```rust #![feature(test)] extern crate test; use std::{f32, f64}; use test::Bencher; const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f32_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite())); } #[bench] fn f32_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY)); } #[bench] fn f32_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite())); } #[bench] fn f32_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY)); } const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f64_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite())); } #[bench] fn f64_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY)); } #[bench] fn f64_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite())); } #[bench] fn f64_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY)); } ```
This avoids 770,000 allocations when compiling the `html5ever` benchmark, reducing instruction counts by up to 2%.
…e path itself. This fixes rust-lang#57462. The relevant part from the hir type collector is: ``` DEBUG 2019-01-09T15:42:58Z: rustc::hir::map::collector: hir_map: NodeId(32) => Entry { parent: NodeId(33), dep_node: 4294967040, node: Expr(expr(32: <Foo>::new)) } DEBUG 2019-01-09T15:42:58Z: rustc::hir::map::collector: hir_map: NodeId(48) => Entry { parent: NodeId(32), dep_node: 4294967040, node: Ty(type(Foo)) } DEBUG 2019-01-09T15:42:58Z: rustc::hir::map::collector: hir_map: NodeId(30) => Entry { parent: NodeId(48), dep_node: 4294967040, node: PathSegment(PathSegment { ident: Foo#0, id: Some(NodeId(30)), def: Some(Err), args: None, infer_types: true }) } DEBUG 2019-01-09T15:42:58Z: rustc::hir::map::collector: hir_map: NodeId(31) => Entry { parent: NodeId(32), dep_node: 4294967040, node: PathSegment(PathSegment { ident: new#0, id: Some(NodeId(31)), def: Some(Err), args: None, infer_types: true }) } ``` We have the right ID when looking for NodeId(31) and try with NodeId(32) (which is the right thing to look for) from get_path_data, but not for the segments that we write from `write_sub_paths_truncated`. Basically `process_path` takes an id which is always the parent, and that we fall back to in `get_path_data()`, so we get the right result for the last path segment, but not for the other segments that get written to from `write_sub_paths_truncated`. I think we can stop passing the explicit id around to `get_path_data` now, will consider sending that as a followup.
On Windows process exit codes are never signals but rather always 32-bit integers. Most faults like segfaults and such end up having large integers used to represent them, like STATUS_ACCESS_VIOLATION being 0xC0000005. Currently, however, when an `ExitStatus` is printed this ends up getting rendered as 3221225477 which is somewhat more difficult to debug. This commit adds a branch in `Display for ExitStatus` on Windows which handles exit statuses where the high bit is set and prints those exit statuses as hex instead of with decimals. This will hopefully preserve the current display for small exit statuses (like `exit code: 22`), but assist in quickly debugging segfaults/access violations/etc. I've found at least that the hex codes are easier to search for than decimal. I wasn't able to find any official documentation saying that all system exit codes have the high bit set, but I figure it's a good enough heuristic for now.
Once a region has been expanded to cover a fixed region, a corresponding RegSubVar constraint won't have any effect on the expansion anymore, the same is true for constraints where the variable on the RHS has already reached static scope. By removing those constraints from the set that we're iterating over, we remove a lot of needless overhead in case of slow convergences (i.e. lots of iterations). For the unicode_normalization crate, this about cuts the time required for item_bodies checking in half.
In functions with lots of region constraint, if the fixed point iteration converges only slowly, a lot of the var/var constraints will have equal regions most of the time. Yet, we still perform the LUB calculation and try to intern the result. Especially the latter incurs quite some overhead. This reduces the take taken by the item bodies checking pass for the unicode_normalization crate by about 75%.
Don't actually create a full MIR stack frame when not needed r? @dotdash This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc rust-lang#36673)
…odrAus Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x). These can both rely on IEEE754 semantics to be made faster, by folding away the sign with an abs (left private for now), and then comparing to infinity, letting the NaN semantics of a direct float comparison handle NaN input properly. The `abs` bit-fiddling is simple (a single and), and so these new forms compile down to a few instructions, without branches, e.g. for f32: ```asm is_infinite: andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000 setae al ret is_finite: andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000 ucomiss xmm1, xmm0 seta al ret ``` When used in loops/repeatedly, they get even better: the memory operations (loading the mask 0x7FFFFFFF for abs, and infinity 0x7F80_0000) are likely to be hoisted out of the individual calls, to be shared, and the `seta`/`setae` are likely to be collapsed into conditional jumps or moves (or similar). The old `is_infinite` did two comparisons, and the old `is_finite` did three (with a branch), and both of them had to check the flags after every one of those comparison. These functions have had that old implementation since they were added in rust-lang@6284190 7 years ago. Benchmark (`abs` is the new form, `std` is the old): ``` test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10) test f32_is_finite_std ... bench: 118 ns/iter (+/- 5) test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1) test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6) test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12) test f64_is_finite_std ... bench: 128 ns/iter (+/- 25) test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5) test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23) ``` ```rust #![feature(test)] extern crate test; use std::{f32, f64}; use test::Bencher; const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f32_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite())); } #[bench] fn f32_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY)); } #[bench] fn f32_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite())); } #[bench] fn f32_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY)); } const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f64_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite())); } #[bench] fn f64_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY)); } #[bench] fn f64_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite())); } #[bench] fn f64_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY)); } ```
…rochenkov Always calculate glob map but only for glob uses Previously calculating glob map was *opt-in*, however it did record node id -> ident use for every use directive. This aims to see if we can unconditionally calculate the glob map and not regress performance. Main motivation is to get rid of some of the moving pieces and simplify the compilation interface - this would allow us to entirely remove `CrateAnalysis`. Later, we could easily expose a relevant query, similar to the likes of `maybe_unused_trait_import` (so using precomputed data from the resolver, but which could be rewritten to be on-demand). r? @nikomatsakis Local perf run showed mostly noise (except `ctfe-stress-*`) but I'd appreciate if we could do a perf run run here and double-check that this won't regress performance.
…varkor Improve the wording I'm sorry but re-opened the PR because I failed to squash commits(rust-lang#57397). Fixes rust-lang#55752. r? @varkor
…x, r=nikomatsakis save-analysis: use a fallback when access levels couldn't be computed Fixing an RLS regression I introduced in rust-lang#57343 😢 I missed a case where we get [called back with analysis when type checking fails](https://github.com/rust-lang/rust/blob/9d54812829e9d92dac35a4a0f358cdc5a2475371/src/librustc_driver/driver.rs#L1264). Since privacy checking normally is done afterwards, when we execute the `privacy_access_levels` query inside the save_analysis callback we'll calculate it for the first time and since typeck info isn't complete, we'll crash there. Double-checked locally and it seems to have fixed the problem. r? @nikomatsakis
Simplify `ConstValue::ScalarPair` While looking at rust-lang#57432 I realized that some of our types for representing constants are very big. This reduces `LazyConst` to 3/4th of its original size and simplifies some code around slices at the same time. r? @RalfJung
lldb_batchmode.py: try `import _thread` for Python 3 None
Some cleanups for core::fmt
…tic-str, r=simulacrum Change `String` to `&'static str` in `ParseResult::Failure`. This avoids 770,000 allocations when compiling the `html5ever` benchmark, reducing instruction counts by up to 2%.
…, r=Kimundi std: Render large exit codes as hex on Windows On Windows process exit codes are never signals but rather always 32-bit integers. Most faults like segfaults and such end up having large integers used to represent them, like STATUS_ACCESS_VIOLATION being 0xC0000005. Currently, however, when an `ExitStatus` is printed this ends up getting rendered as 3221225477 which is somewhat more difficult to debug. This commit adds a branch in `Display for ExitStatus` on Windows which handles exit statuses where the high bit is set and prints those exit statuses as hex instead of with decimals. This will hopefully preserve the current display for small exit statuses (like `exit code: 22`), but assist in quickly debugging segfaults/access violations/etc. I've found at least that the hex codes are easier to search for than decimal. I wasn't able to find any official documentation saying that all system exit codes have the high bit set, but I figure it's a good enough heuristic for now.
save-analysis: Get path def from parent in case there's no def for the path itself. This fixes rust-lang#57462. The relevant part from the hir type collector is: ``` DEBUG 2019-01-09T15:42:58Z: rustc::hir::map::collector: hir_map: NodeId(32) => Entry { parent: NodeId(33), dep_node: 4294967040, node: Expr(expr(32: <Foo>::new)) } DEBUG 2019-01-09T15:42:58Z: rustc::hir::map::collector: hir_map: NodeId(48) => Entry { parent: NodeId(32), dep_node: 4294967040, node: Ty(type(Foo)) } DEBUG 2019-01-09T15:42:58Z: rustc::hir::map::collector: hir_map: NodeId(30) => Entry { parent: NodeId(48), dep_node: 4294967040, node: PathSegment(PathSegment { ident: Foo#0, id: Some(NodeId(30)), def: Some(Err), args: None, infer_types: true }) } DEBUG 2019-01-09T15:42:58Z: rustc::hir::map::collector: hir_map: NodeId(31) => Entry { parent: NodeId(32), dep_node: 4294967040, node: PathSegment(PathSegment { ident: new#0, id: Some(NodeId(31)), def: Some(Err), args: None, infer_types: true }) } ``` We have the right ID when looking for NodeId(31) and try with NodeId(32) (which is the right thing to look for) from get_path_data. But not when we look from `write_sub_paths_truncated` Basically process_path takes an id which is always the parent, and that we fall back to in get_path_data(), so we get the right result for the last path segment, but not for the other segments that get written to from write_sub_paths_truncated. I think we can stop passing the explicit `id` around to get_path_data as a followup.
Speed up item_bodies for large match statements involving regions These changes don't change anything about the complexity of the algorithms, but use some easy shortcuts or modifications to cut down some overhead. The first change, which incrementally removes the constraints from the set we're iterating over probably introduces some overhead for small to medium sized constraint sets, but it's not big enough for me to observe it in any project I tested against (not that many though). Though most other crates probably won't improve much at all, because huge matches aren't that common, the changes seemed simple enough for me to make them. Ref rust-lang#55528 cc unicode-rs/unicode-normalization#29 r? @nikomatsakis
re-do docs for core::cmp Fixes rust-lang#32934
…umeGomez rustdoc: Allow inlining of reexported crates and crate items Fixes rust-lang#46296 This PR checks for when a `pub extern crate` statement has a `#[doc(inline)]` attribute & inlines its contents. Code is based off of the inlining statements for `pub use` statements.
Use `ptr::eq` where applicable Stumbled upon a few of `A as *const _ as usize == B as *const as usize`, so I decided to follow the programming boy scout rule (:smile:) and replaced the pattern with more widely used `ptr::eq`.
@bors r+ p=15 |
📌 Commit 156c35f has been approved by |
⌛ Testing commit 156c35f with merge ef700e27bf436164edc67384faecee9bf4dac147... |
💔 Test failed - checks-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Successful merges:
is_finite
(2x) andis_infinite
(1.6x). #57353 (Optimise floating pointis_finite
(2x) andis_infinite
(1.6x).)ConstValue::ScalarPair
#57442 (SimplifyConstValue::ScalarPair
)import _thread
for Python 3 #57453 (lldb_batchmode.py: tryimport _thread
for Python 3)String
to&'static str
inParseResult::Failure
. #57461 (ChangeString
to&'static str
inParseResult::Failure
.)ptr::eq
where applicable #57547 (Useptr::eq
where applicable)Failed merges:
r? @ghost