Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify/Optimize FileEncoder #115542

Merged
merged 2 commits into from
Sep 20, 2023
Merged

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Sep 4, 2023

FileEncoder is basically a BufWriter except that it exposes access to the not-written-to-yet region of the buffer so that some users can write directly to the buffer. This strategy is awesome because it lets us avoid calling memcpy for small copies, but the previous strategy was based on the writer accessing a &mut [MaybeUninit<u8>; N] and returning a &[u8] which is an API which currently mandates the use of unsafe code, making that interface in general not that appealing.

So this PR cleans up the FileEncoder implementation and builds on that general idea of direct buffer access in order to prevent memcpy calls in a few key places when encoding the dep graph and rmeta tables. The interface used here is now 100% safe, but with the caveat that internally we need to avoid trusting the number of bytes that the provided function claims to have written.

The original primary objective of this PR was to clean up the FileEncoder implementation so that the fix for the following issues would be easy to implement. The fix for these issues is to correctly update self.buffered even when writes fail, which I think it's easy to verify manually is now done, because all the FileEncoder methods are small.

Fixes #115298
Fixes #114671
Fixes #114045
Fixes #108100
Fixes #106787

@saethlin saethlin added the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Sep 4, 2023
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 4, 2023
@saethlin saethlin removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 4, 2023
@saethlin
Copy link
Member Author

saethlin commented Sep 4, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 4, 2023
@bors
Copy link
Contributor

bors commented Sep 4, 2023

⌛ Trying commit a3671fd3993f906b9a9879b1465a918b9d80cac3 with merge 9bd418951e5c1d9a80f4d5ff524fde43a43149bf...

@bors
Copy link
Contributor

bors commented Sep 4, 2023

☀️ Try build successful - checks-actions
Build commit: 9bd418951e5c1d9a80f4d5ff524fde43a43149bf (9bd418951e5c1d9a80f4d5ff524fde43a43149bf)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9bd418951e5c1d9a80f4d5ff524fde43a43149bf): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [0.3%, 4.1%] 138
Regressions ❌
(secondary)
1.8% [0.5%, 4.0%] 74
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.1% [0.3%, 4.1%] 138

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.6% [-2.6%, -2.6%] 1
All ❌✅ (primary) 0.5% [0.5%, 0.5%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.8% [0.7%, 10.2%] 118
Regressions ❌
(secondary)
4.3% [1.5%, 8.0%] 47
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.8% [0.7%, 10.2%] 118

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 627.22s -> 627.249s (0.00%)
Artifact size: 316.03 MiB -> 316.50 MiB (0.15%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 4, 2023
@rust-log-analyzer

This comment has been minimized.

@saethlin
Copy link
Member Author

saethlin commented Sep 5, 2023

That's pretty slow but the regressions are almost exclusively confined to I have to wonder if one of my other PRs will obviate this. Ah well, let's see what happens if we optimize the leb128 path a bunch...
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Sep 5, 2023

⌛ Trying commit f16b987d8162c60ac2f91c185451b1182cff1acf with merge a0f5ca19b76ebc68ddad77a862195f68e03eec62...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Sep 5, 2023

💔 Test failed - checks-actions

@bors bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 5, 2023
@saethlin
Copy link
Member Author

saethlin commented Sep 5, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 5, 2023
@bors
Copy link
Contributor

bors commented Sep 5, 2023

⌛ Trying commit f16b987d8162c60ac2f91c185451b1182cff1acf with merge 230afd4b68f6d1786901abded18b5dc9ea431540...

@saethlin saethlin changed the title Use BufWriter to implement FileEncoder Simplify FileEncoder? Sep 5, 2023
@bors
Copy link
Contributor

bors commented Sep 5, 2023

☀️ Try build successful - checks-actions
Build commit: 230afd4b68f6d1786901abded18b5dc9ea431540 (230afd4b68f6d1786901abded18b5dc9ea431540)

@rust-timer

This comment has been minimized.

@saethlin saethlin removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 20, 2023
@saethlin
Copy link
Member Author

@bors r=WaffleLapkin rollup=never

@bors
Copy link
Contributor

bors commented Sep 20, 2023

📌 Commit 6cee6b0 has been approved by WaffleLapkin

It is now in the queue for this repository.

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Sep 20, 2023
@bors
Copy link
Contributor

bors commented Sep 20, 2023

⌛ Testing commit 6cee6b0 with merge 3223b0b...

@bors
Copy link
Contributor

bors commented Sep 20, 2023

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 3223b0b to master...

1 similar comment
@bors
Copy link
Contributor

bors commented Sep 20, 2023

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 3223b0b to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 20, 2023
@bors bors merged commit 3223b0b into rust-lang:master Sep 20, 2023
11 checks passed
@rustbot rustbot added this to the 1.74.0 milestone Sep 20, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3223b0b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.7%, 1.3%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.6% [-1.2%, -0.2%] 95
Improvements ✅
(secondary)
-0.6% [-1.2%, -0.3%] 28
All ❌✅ (primary) -0.6% [-1.2%, 1.3%] 97

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.4% [1.1%, 3.7%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [1.0%, 1.0%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.1%] 6
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.5%, -0.1%] 10
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-0.5%, 0.1%] 16

Bootstrap: 632.887s -> 633.664s (0.12%)
Artifact size: 317.80 MiB -> 318.05 MiB (0.08%)

@rustbot rustbot added the perf-regression Performance regression. label Sep 21, 2023
@rylev
Copy link
Member

rylev commented Sep 26, 2023

@saethlin @WaffleLapkin looks like a small regression snuck back in. The regressions seem real, but perhaps it's not worth it given the much larger amount of improvements. Thoughts?

@WaffleLapkin
Copy link
Member

@rylev the two regressions are image-0.24.1 | opt | full | 1.27% and image-0.24.1 | opt | incr-full | 0.74%. If I'm reading it right, both of them spiked and then returned to normal, so this looks like a fluke.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Sep 26, 2023
@lqd
Copy link
Member

lqd commented Sep 26, 2023

You may be looking at the relative graph instead of the absolute one?

image

@WaffleLapkin
Copy link
Member

WaffleLapkin commented Sep 26, 2023

@lqd yes I am 🤦🏻
@rustbot label: -perf-regression-triaged
(I still think it's justifiable, but since it's not a fluke I'm not so sure)

@rustbot rustbot removed the perf-regression-triaged The performance regression has been triaged. label Sep 26, 2023
@lqd
Copy link
Member

lqd commented Sep 26, 2023

(yeah, agreed.)

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 27, 2023
Clamp instead of asserting in FileEncoder::write_with

r? `@WaffleLapkin`

If this isn't the regression mentioned in rust-lang#115542 (comment) I'd have to actually look into it.
@saethlin
Copy link
Member Author

If #116188 doesn't recoup the regression, I'll mark this as triaged on the basis that the regression is very small and in a change that's overwhelmingly positive. As you suggest, it's not worth our time to look into a tiny regression in an overwhelmingly green report unless there's an easy fix.

@saethlin
Copy link
Member Author

The cachegrind diff for the regressed benchmark indicates that the regression is somehow in LLVM. I'm not really sure how that's possible, but between this cachegrind diff and the linked PR I think I've done the amount of investigation that this merits and also come up with nothing that points towards fixing the lone regression.

Ir    
--------------------------------------------------------------------------------
2,016,129,215  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir           file:function
--------------------------------------------------------------------------------
183,084,207  ???:llvm::ConstraintSystem::eliminateUsingFM()
112,154,338  ???:(anonymous namespace)::LazyValueInfoImpl::solve() [clone 
102,793,767  ???:llvm::BasicAAResult::alias(llvm::MemoryLocation const&, llvm::MemoryLocation const&, llvm::AAQueryInfo&, llvm::Instruction const*)
 90,761,916  ???:llvm::AAResults::getModRefInfo(llvm::Instruction const*, std::optional<llvm::MemoryLocation> const&, llvm::AAQueryInfo&)
 64,597,572  ???:llvm::ObjectSizeOffsetVisitor::compute(llvm::Value*)
-62,036,937  ???:llvm::APInt::operator*(llvm::APInt const&) const
 61,801,373  <all-jemalloc-files>:<all-jemalloc-functions>
 60,275,661  ???:llvm::ConstantRange::multiply(llvm::ConstantRange const&) const
 54,151,328  ???:rustc_data_structures::graph::dominators::dominators::<rustc_middle::mir::basic_blocks::BasicBlocks>
-53,446,123  ???:<core::cell::once::OnceCell<rustc_data_structures::graph::dominators::Dominators<rustc_middle::mir::BasicBlock>>>::get_or_try_init::<<core::cell::once::OnceCell<rustc_data_structures::graph::dominators::Dominators<rustc_middle::mir::BasicBlock>>>::get_or_init<<rustc_middle::mir::basic_blocks::BasicBlocks>::dominators::{closure#0}>::{closure#0}, !>
 50,772,992  ???:llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const
 42,317,006  ???:llvm::IDFCalculatorBase<llvm::BasicBlock, false>::calculate(llvm::SmallVectorImpl<llvm::BasicBlock*>&)
 41,230,820  ???:computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&) [clone 
 37,819,326  ???:combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) [clone  
 35,505,970  ???:llvm::BasicAAResult::aliasGEP(llvm::GEPOperator const*, llvm::LocationSize, llvm::Value const*, llvm::LocationSize, llvm::Value const*, llvm::Value const*, llvm::AAQueryInfo&)
-35,083,171  ???:getDefaultInlineAdvice(llvm::CallBase&, llvm::AnalysisManager<llvm::Function>&, llvm::InlineParams const&) [clone 
 34,493,533  ???:<rustc_span::span_encoding::Span as rustc_serialize::serialize::Decodable<rustc_metadata::rmeta::decoder::DecodeContext>>::decode
 32,163,222  ???:(anonymous namespace)::PromoteMem2Reg::run()
 32,137,790  ???:llvm::InstCombinerImpl::run()
 31,327,292  ???:llvm::Value::stripAndAccumulateConstantOffsets(llvm::DataLayout const&, llvm::APInt&, bool, bool, llvm::function_ref<bool (llvm::Value&, llvm::APInt&)>) const
 30,795,545  /usr/src/debug/glibc/glibc/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms
 30,224,758  ???:<rustc_middle::ty::Ty as rustc_type_ir::fold::TypeSuperFoldable<rustc_middle::ty::context::TyCtxt>>::try_super_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
 30,057,337  ???:llvm::shouldInline(llvm::CallBase&, llvm::function_ref<llvm::InlineCost (llvm::CallBase&)>, llvm::OptimizationRemarkEmitter&, bool)
-29,731,698  ???:<rustc_middle::ty::generic_args::GenericArg as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_middle::ty::generic_args::ArgFolder>

@saethlin saethlin added perf-regression-triaged The performance regression has been triaged. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 29, 2023
@saethlin saethlin deleted the fileencoder-is-bufwriter branch September 29, 2023 23:06
@lqd lqd mentioned this pull request Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
8 participants