-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change twice used large const table to static #82676
Conversation
This table is used twice in core::num::dec2flt::algorithm::power_of_ten. According to the semantics of const, a separate huge definition of the table is inlined at both places. fn power_of_ten(e: i16) -> Fp { assert!(e >= table::MIN_E); let i = e - table::MIN_E; let sig = table::POWERS.0[i as usize]; let exp = table::POWERS.1[i as usize]; Fp { f: sig, e: exp } } Theoretically this gets cleaned up by optimization passes, but in practice I am experiencing a miscompile from LTO on this code. Making the table a static, which would only be defined a single time and not require attention from LTO, eliminates the miscompile and seems semantically more appropriate anyway. A separate bug report on the LTO bug is forthcoming.
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
I wouldn't expect any significant difference unless there's some kind of const propagation being done by LLVM today, but have you checked disassembly or run some benchmarks? We could run rustc-perf but it's not really a float parsing workload, but can't hurt either - @bors try @rust-timer queue Otherwise this seems fine to me. An alternative could be splitting the constant in two so that there's just one use per "part", but doesn't feel obviously better. |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit bd51dea with merge 015314cb764d47a719b5c1c60c0f12101e8c0cc5... |
@Mark-Simulacrum yeah here's the disassembly: Beforecore::num::dec2flt::algorithm::power_of_ten::hf417b425dc5ff8a8:
0xd839f00 <+0>: pushq %rax
0xd839f01 <+1>: movl %edi, %eax
0xd839f03 <+3>: movswl %ax, %ecx
0xd839f06 <+6>: cmpl $0xfffffecf, %ecx
0xd839f0c <+12>: jl 0xd839f36 ; <+54> at algorithm.rs:17:5
0xd839f0e <+14>: addl $0x131, %eax
0xd839f13 <+19>: movswq %ax, %rdi
0xd839f17 <+23>: movzwl %ax, %eax
0xd839f1a <+26>: cmpl $0x263, %eax
0xd839f1f <+31>: jae 0xd839f51 ; <+81> at algorithm.rs:19:15
0xd839f21 <+33>: leaq 0x83bc830(%rip), %rcx
0xd839f28 <+40>: movq (%rcx,%rdi,8), %rax
0xd839f2c <+44>: movzwl 0x1318(%rcx,%rdi,2), %edx
0xd839f34 <+52>: popq %rcx
0xd839f35 <+53>: retq
0xd839f36 <+54>: leaq 0x83bc7cc(%rip), %rdi
0xd839f3d <+61>: leaq 0x65ab04c(%rip), %rdx
0xd839f44 <+68>: movl $0x23, %esi
0xd839f49 <+73>: callq *0x65ae839(%rip)
0xd839f4f <+79>: ud2
0xd839f51 <+81>: leaq 0x65ab050(%rip), %rdx
0xd839f58 <+88>: movl $0x263, %esi
0xd839f5d <+93>: callq *0x65ae81d(%rip)
0xd839f63 <+99>: ud2 Expected table address in this assembly is $ (lldb) x/4xg 0x83bc830+0xd839f28
0x15bf6758: 0xf7c45042f7c45042 0xf7c45042f7c45042
0x15bf6768: 0xf7c45042f7c45042 0xf7c45042f7c45042
$ (lldb) memory find -c2 -e 0xe0b62e2929aba83c -- 0x00400000 0x20000000
data found at location: 0x12b6de68
0x12b6de68: 3c a8 ab 29 29 2e b6 e0 26 49 0b ba d9 dc 71 8c
0x12b6de78: 6f 1b 8e 28 10 54 8e af 4b a2 b1 32 14 e9 71 db
no more matches within the range. Aftercore::num::dec2flt::algorithm::power_of_ten::hf417b425dc5ff8a8:
0xd839ec0 <+0>: pushq %rax
0xd839ec1 <+1>: movl %edi, %eax
0xd839ec3 <+3>: movswl %ax, %ecx
0xd839ec6 <+6>: cmpl $0xfffffecf, %ecx
0xd839ecc <+12>: jl 0xd839ef6 ; <+54> at algorithm.rs:17:5
0xd839ece <+14>: addl $0x131, %eax
0xd839ed3 <+19>: movswq %ax, %rdi
0xd839ed7 <+23>: movzwl %ax, %eax
0xd839eda <+26>: cmpl $0x263, %eax
0xd839edf <+31>: jae 0xd839f11 ; <+81> at algorithm.rs:19:15
0xd839ee1 <+33>: leaq 0x53409c0(%rip), %rcx ; core::num::dec2flt::table::POWERS::hf668a1d7d9ba717f
0xd839ee8 <+40>: movq (%rcx,%rdi,8), %rax
0xd839eec <+44>: movzwl 0x1318(%rcx,%rdi,2), %edx
0xd839ef4 <+52>: popq %rcx
0xd839ef5 <+53>: retq
0xd839ef6 <+54>: leaq 0x83bc80c(%rip), %rdi
0xd839efd <+61>: leaq 0x65ab08c(%rip), %rdx
0xd839f04 <+68>: movl $0x23, %esi
0xd839f09 <+73>: callq *0x65ae879(%rip)
0xd839f0f <+79>: ud2
0xd839f11 <+81>: leaq 0x65ab090(%rip), %rdx
0xd839f18 <+88>: movl $0x263, %esi
0xd839f1d <+93>: callq *0x65ae85d(%rip)
0xd839f23 <+99>: ud2 Expected table address: $ (lldb) x/4xg 0x53409c0+0xd839ee8
0x12b7a8a8: 0xe0b62e2929aba83c 0x8c71dcd9ba0b4926
0x12b7a8b8: 0xaf8e5410288e1b6f 0xdb71e91432b1a24b rust/library/core/src/num/dec2flt/table.rs Lines 8 to 13 in 5233edc
|
Is it worth adding a warning in rustc for such situations? |
Looks good to me. Presuming perf doesn't show anything surprising, r=me |
I would be shocked if there is a perf difference, since |
Yeah, I would also be shocked. If it's urgent feel free to approve before then, but I figured a few hours wouldn't really matter. |
☀️ Try build successful - checks-actions |
Queued 015314cb764d47a719b5c1c60c0f12101e8c0cc5 with parent 5233edc, future comparison URL. |
Finished benchmarking try commit (015314cb764d47a719b5c1c60c0f12101e8c0cc5): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@bors r=Mark-Simulacrum rollup |
📌 Commit bd51dea has been approved by |
…laumeGomez Rollup of 7 pull requests Successful merges: - rust-lang#80734 (check that first arg to `panic!()` in const is `&str`) - rust-lang#81932 (Always compile rustdoc with debug logging enabled when `download-rustc` is set) - rust-lang#82018 (Remove the dummy cache in `DocContext`; delete RenderInfo) - rust-lang#82598 (Check stability and feature attributes in rustdoc) - rust-lang#82655 (Highlight identifier span instead of whole pattern span in `unused` lint) - rust-lang#82662 (Warn about unknown doc attributes) - rust-lang#82676 (Change twice used large const table to static) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
This table is used twice in core::num::dec2flt::algorithm::power_of_ten. According to the semantics of const, a separate huge definition of the table is inlined at both places.
rust/library/core/src/num/dec2flt/algorithm.rs
Lines 16 to 22 in 5233edc
Theoretically this gets cleaned up by optimization passes, but in practice I am experiencing a miscompile from LTO on this code. Making the table a static, which would only be defined a single time and not require attention from LTO, eliminates the miscompile and seems semantically more appropriate anyway. A separate bug report on the LTO bug is forthcoming.
Original addition of
const
is from #27307.