-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Avoid re-interning types in outlives checking #67899
Conversation
r? @varkor (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion |
perf: Avoid re-interning types in outlives checking In profiling `intern_ty` is a very hot function (9% in the test I used). While there does not seem to be a way to reduce the cost of calling we can avoid the call in some cases. In outlives checking `ParamTy` and `ProjectionTy` are extracted from the `Ty` value that contains them only to later be passed as an argument to `intern_ty` again later. This seems to be happening a lot in my test with `intern_ty` called from outlives is at ~6%. Since all `ParamTy` and `ProjectionTy` are already stored in a `Ty` I had an idea to pass around a `View` type which provides direct access to the specific, inner type without losing the original `Ty` pointer. While the current implementation does so with some unsafe to let the branch be elided on `Deref`, it could be done entirely in safe code as well, either by accepting the (predictable) branch in `Deref` or by storing the inner type in `View` as well as the `Ty`. But considering that the unsafe is trivial to prove and the call sites seem quite hot I opted to show the unsafe approach first. Based on #67840 (since it touches the same file/lines) Commits without #67840 https://github.com/rust-lang/rust/pull/67899/files/77ddc3540e52be4b5bd75cf082c621392acaf81b..b55bab206096c27533120921f6b0c273f115e34a
Did you try just passing along the |
No, my roundtrip for profiling are >1.5 h with the computers I have access to atm so trying variations isn't very feasible. I would be surprised if it gave a measurable effect, as I don't believe the values are passed around enough to cause the extra memcpy/register copies to matter. It is definitely slower though not matter how minuscule (hence why I submitted it as is for feedback, also I only thought of that variant while writing the PR message). |
☀️ Try build successful - checks-azure |
Queued 4218803 with parent 7785834, future comparison URL. |
Perf shows some regressions and no improvements. If you've made some changes, I can trigger another perf run. |
☔ The latest upstream changes (presumably #67886) made this pull request unmergeable. Please resolve the merge conflicts. |
The perf run should be done without #67840. It looks like an improvement diffing the diffs =P |
Separated this out from #67840 since that one looks problematic. |
@bors try @rust-timer queue |
⌛ Trying commit 4e77efb41bec21ddef3223e484eaa21f531e6942 with merge bd311dcdd7b31daa87c22040bf0895a276ceffda... |
☔ The latest upstream changes (presumably #67970) made this pull request unmergeable. Please resolve the merge conflicts. |
☀️ Try build successful - checks-azure |
@rust-timer build bd311dcdd7b31daa87c22040bf0895a276ceffda |
Queued bd311dcdd7b31daa87c22040bf0895a276ceffda with parent 8597644, future comparison URL. |
Regression still persists, so I guess one of the refactorings actually caused it instead of the main change from #67840 . Will try to isolate it, it might just be that regions get added in a slightly different way which happens to further regress the slow loop in lexical_region_resole. #68001 should fix that to some extent at least. |
☔ The latest upstream changes (presumably #68078) made this pull request unmergeable. Please resolve the merge conflicts. |
Rebased and removed the commits which causes the regression |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
perf: Avoid re-interning types in outlives checking In profiling `intern_ty` is a very hot function (9% in the test I used). While there does not seem to be a way to reduce the cost of calling we can avoid the call in some cases. In outlives checking `ParamTy` and `ProjectionTy` are extracted from the `Ty` value that contains them only to later be passed as an argument to `intern_ty` again later. This seems to be happening a lot in my test with `intern_ty` called from outlives is at ~6%. Since all `ParamTy` and `ProjectionTy` are already stored in a `Ty` I had an idea to pass around a `View` type which provides direct access to the specific, inner type without losing the original `Ty` pointer. While the current implementation does so with some unsafe to let the branch be elided on `Deref`, it could be done entirely in safe code as well, either by accepting the (predictable) branch in `Deref` or by storing the inner type in `View` as well as the `Ty`. But considering that the unsafe is trivial to prove and the call sites seem quite hot I opted to show the unsafe approach first. Based on #67840 (since it touches the same file/lines) Commits without #67840 https://github.com/rust-lang/rust/pull/67899/files/77ddc3540e52be4b5bd75cf082c621392acaf81b..b55bab206096c27533120921f6b0c273f115e34a
☀️ Try build successful - checks-azure |
Queued 51adcde with parent 2d8d559, future comparison URL. |
Finished benchmarking try commit 51adcde, comparison URL. |
`ty::View<'tcx, T>` acts like a `T` but internally stores a `Ty<'tcx>` pointer. Thanks to this, it is possible to retrieve the original `Ty` value without needing to ask the interner for it. Looking up the already created type (`T`) takes a good chunk of the time in `infer/outlives/verify.rs` so this should be a good speedup. It may be applicable in other places as well, but those are far lower when profiling.
But it instead provies "view" types when possible
Added some inline annotations and specialized TypeFoldable and Lift. Can I get a perf run to see if it fixes the regression? |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 1214625 with merge 170c7fc350184210f80c8bba1590e650f02fd5c5... |
☀️ Try build successful - checks-azure |
Queued 170c7fc350184210f80c8bba1590e650f02fd5c5 with parent 1b117d7, future comparison URL. |
Finished benchmarking try commit 170c7fc350184210f80c8bba1590e650f02fd5c5, comparison URL. |
Regression is gone, but performance is the same or slightly improved on trait heavy crates (https://github.com/Marwes/combine is the crate mentioned in the PR, which is basically entirely trait implementations). I can get some numbers for that perhaps. I have seen some other spots which could potentially benefit from this optimization (both the part that avoids interning and the space reduction part) but I wanted to see that this works out first. |
The numbers from the perf run look like noise. If there are some crates that do are optimised with this change, we should get numbers on them and also add them to perf so we don't overlook these sorts of changes. |
Ping from Triage: any updates @Marwes? |
Ping from triage: @Marwes |
In profiling
intern_ty
is a very hot function (9% in the test I used). While there does not seem to be a way to reduce the cost of calling we can avoid the call in some cases.In outlives checking
ParamTy
andProjectionTy
are extracted from theTy
value that contains them only to later be passed as an argument tointern_ty
again later. This seems to be happening a lot in my test withintern_ty
called from outlives is at ~6%.Since all
ParamTy
andProjectionTy
are already stored in aTy
I had an idea to pass around aView
type which provides direct access to the specific, inner type without losing the originalTy
pointer. While the current implementation does so with some unsafe to let the branch be elided onDeref
, it could be done entirely in safe code as well, either by accepting the (predictable) branch inDeref
or by storing the inner type inView
as well as theTy
. But considering that the unsafe is trivial to prove and the call sites seem quite hot I opted to show the unsafe approach first.Based on #67840 (since it touches the same file/lines)
Commits without #67840 https://github.com/rust-lang/rust/pull/67899/files/77ddc3540e52be4b5bd75cf082c621392acaf81b..b55bab206096c27533120921f6b0c273f115e34a