-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM codegen: Don't emit zero-sized padding for fields #87254
LLVM codegen: Don't emit zero-sized padding for fields #87254
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @estebank (or someone else) soon. Please see the contribution instructions for more information. |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 535bce34741396022cb8e5e0740aaaacb0378f4e with merge 67ef29bb93ef18aa16af35c42098c1987b3b03d2... |
☀️ Try build successful - checks-actions |
Queued 67ef29bb93ef18aa16af35c42098c1987b3b03d2 with parent 1807305, future comparison URL. |
Finished benchmarking try commit (67ef29bb93ef18aa16af35c42098c1987b3b03d2): comparison url. Summary: This change led to significant mixed results 🤷 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this perf run, please indicate this with @bors rollup=never |
I will add a testcase to make sure that this actually makes it possible to fix rust-lang/stdarch#1143 and implement caching of the projections. Once done I will request another perf run. |
@bors try @rust-timer queue per the request on zulip |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 8afaf9e3f8f64e9fc9a6e713bca7032df45fb64f with merge 6a60d9e36a40b362f083b88fa742e5a1377ce10c... |
☀️ Try build successful - checks-actions |
Queued 6a60d9e36a40b362f083b88fa742e5a1377ce10c with parent 0ecff8c, future comparison URL. |
Thanks a lot for submitting this! |
Finished benchmarking try commit (6a60d9e36a40b362f083b88fa742e5a1377ce10c): comparison url. Summary: This change led to significant mixed results 🤷 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this perf run, please indicate this with @bors rollup=never |
Number of fields is usually small, so try using a smallvec. Also if a type has no padding at all, could you not put it in the field projection cache and use memory_index as a fallback? I assume many/most types wouldn't have padding, so it could shrink the cache hashmap considerably. |
|
@hkratz You searched bad, even github will find usages of smallvec https://github.com/rust-lang/rust/search?q=smallvec . |
No idea how I missed that... Not sure if it will improve performance though either. Will dig some more. |
The additional time for the deep-vector-opt is not directly related to the changed code. Instead the thin-local LTO apparently takes longer for the emitted code without zero-sized padding: With thin-local LTO (default): 4% more time taken
With thin-local LTO disabled:
I have no idea why that is the case but I don't think that it should block this PR. The generated LL IR code looks as expected. |
@bors retry spurious network fatal: unable to access 'https://github.com/servo/servo/': GnuTLS recv error (-54): Error in the pull function. |
⌛ Testing commit 02295f4 with merge 14795284b7e3779b7a45eedd6cfb4a7122cf1770... |
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
@bors retry Failed while pushing toolstate. https://www.githubstatus.com/incidents/rmfrw9dfbtbp
|
⌛ Testing commit 02295f4 with merge 97d8fddfa14b660a24ffeed2ab0784aaa379a9ff... |
💥 Test timed out |
@bors retry Tests finished successfully hours ago, but bors didn't get triggered on the success. Presumably due to GitHub incident. |
☀️ Test successful - checks-actions |
Visiting for perf triage. Seems like the regression to deep-vector-opt is anticipated, and acceptable cost to get this in. Other than that, I don't see anything significant in the perf comparison for when this landed. Marking as perf-regression-triaged. |
Currently padding is emitted before fields of a struct and at the end of the struct regardless of the ABI. Even if no padding is required zero-sized padding fields are emitted. This is not useful and - more importantly - it make it impossible to generate the exact vector types that LLVM expects for certain ARM SIMD intrinsics. This change should unblock the implementation of many ARM intrinsics using the
unadjusted
ABI, see rust-lang/stdarch#1143 (comment).This is a proof of concept only because the field lookup now takes O(number of fields) time compared to O(1) before since it recalculates the mapping at every lookup. I would like to find out how big the performance impact actually is before implementing caching or restricting this behavior to the
unadjusted
ABI.cc @SparrowLii @bjorn3
(Discussion on internals)