Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce minimalistic ttl eviction for Recycler/PinnedVec #15139

Closed
wants to merge 2 commits into from

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Feb 5, 2021

Problem

PinnedVec tends to leak indefinitely due to some past burst.

#14366 (comment)

@sakridge could this also be a legimate leak aside from the Vec HashMap capacity leak? I don't think we need 5GB / 3GB heap data at any given moment to run validator...

The recyclers just allocate to cover the transitional load and don't size down generally. Potentially we can size down with some heuristic if we can detect there is too much being held onto.

Summary of Changes

Amortized very-lightweight (1 vdso call for writes, 1 vdso + 2 O(1) VecDeque ops + 1 T::drop() for reads) eviction at each read from Recycler only when there has been enough capacity for the last hour.

@ryoqun ryoqun requested a review from sakridge February 5, 2021 13:29
@ryoqun ryoqun added v1.4 and removed v1.4 labels Feb 5, 2021
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::{Arc, Mutex, Weak};
use std::time::Instant;
use std::{collections::VecDeque, sync::atomic::AtomicBool};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, rustfmt became lazy. ;)

@codecov
Copy link

codecov bot commented Feb 5, 2021

Codecov Report

Merging #15139 (6425e4a) into master (774416a) will decrease coverage by 0.0%.
The diff coverage is 96.1%.

@@            Coverage Diff            @@
##           master   #15139     +/-   ##
=========================================
- Coverage    79.5%    79.5%   -0.1%     
=========================================
  Files         402      402             
  Lines      102297   102342     +45     
=========================================
+ Hits        81416    81449     +33     
- Misses      20881    20893     +12     

@sakridge
Copy link
Member

sakridge commented Feb 5, 2021

Nice, this seems pretty good, any perf change with the vec-deque and extra checks?

.gc
.lock()
.expect("recycler lock in pb fn allocate");
if let Some((oldest_time, _old_item)) = gc.front() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about maybe checking the length of gc here. Maybe the recycler is just not used in a while and we are popping an extra one where we only have a handful of entries (1-5).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, nice addition! Also, I noticed we shouldn't pop and drop the last one. We can just recycle it.. ;) I chose 10 for no particular strong reason. I just felt expiring with 5 items sounded too aggressive.... just that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, sounds good.

@ryoqun ryoqun force-pushed the ttl-recycler-pinnedvec branch from e44daf5 to 6425e4a Compare February 7, 2021 14:51
}

bencher.iter(move || {
recycler.recycle_for_test(recycler.allocate("me"));
Copy link
Member Author

@ryoqun ryoqun Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this seems pretty good, any perf change with the vec-deque and extra checks?

In summary, sadly this increased twice of fast path of recycle->allocate iteration around 70ns to 140ns. But, I think this shouldn't affect the overall cluster perf. (I'm running gcp system-performance-test to be sure).

In detail, simply changing to VecDeque increases around 10ns and calling Instance::now() twice increases around 30ns * 2 by recycle and allocate.

base:
test bench_recycler ... bench:          73 ns/iter (+/- 22)

only changed Vec => VecDeque collection type:
test bench_recycler ... bench:          85 ns/iter (+/- 7)

this pr:
test bench_recycler ... bench:         144 ns/iter (+/- 19)

I think these numbers are reasonable. VecDec is basically Vec and it only incurs some small additional offset calculation so the order should be 10ns-20ns.

As for Instance::now, I think this is decent value as well, considering it's basically vdso call, recalling my vague perf number for it. xD

Also, I also think VecDeque is faster than BTreeMap or BinaryHeap implementation, although, I haven't tested.

This tested on my laptop and I know there is jitter. If you're curious, I can run more rigorous experiment in more clean room enviroment. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok not bad, thanks.

@ryoqun ryoqun marked this pull request as ready for review February 7, 2021 15:36
@ryoqun ryoqun requested a review from sakridge February 7, 2021 15:37
@ryoqun
Copy link
Member Author

ryoqun commented Feb 7, 2021

As for the actual observation of leaks are fixed. I'm still running tested validator.. This takes a while.

let (_, mut expired) = gc.pop_front().unwrap();
// unref-ing recycler here is crucial to prevent
// the expired from being recycled again via Drop,
// lading to dead lock!
Copy link
Member Author

@ryoqun ryoqun Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my test validator was bitten by this. T_T

rerunning it....

@ryoqun
Copy link
Member Author

ryoqun commented Feb 8, 2021

well, test validator survived running against mainnet-beta for a day without deadlocks. So, I think I fixed the bug.

Now testing running testnet via heaptrack to see leaks are really solved as a last check.

@ryoqun
Copy link
Member Author

ryoqun commented Feb 8, 2021

well it turned out, this fix doesn't fix the leak...... the gossip-receiver code doesn't recycle PinnedVecs at all actually. I'm trying to fix it as well.

3 similar comments
@ryoqun
Copy link
Member Author

ryoqun commented Feb 8, 2021

well it turned out, this fix doesn't fix the leak...... the gossip-receiver code doesn't recycle PinnedVecs at all actually. I'm trying to fix it as well.

@ryoqun
Copy link
Member Author

ryoqun commented Feb 8, 2021

well it turned out, this fix doesn't fix the leak...... the gossip-receiver code doesn't recycle PinnedVecs at all actually. I'm trying to fix it as well.

@ryoqun
Copy link
Member Author

ryoqun commented Feb 8, 2021

well it turned out, this fix doesn't fix the leak...... the gossip-receiver code doesn't recycle PinnedVecs at all actually. I'm trying to fix it as well.

@carllin
Copy link
Contributor

carllin commented Feb 14, 2021

@ryoqun the reason this may not be recycling is because https://github.com/solana-labs/solana/pull/15139/files#diff-798271cb43dca47253c06d8ab6888d8bf4fb385fc8e74cc2cd70ac1b3b664ebfR93 checks the oldest element, but if I have N elements, then if:

  1. I only need 1 element at a time
  2. N-1 elements are always sitting in the queue

As long as I have N total allocations before the max eviction time passes, then every packet will be refreshed by the recycle logic https://github.com/solana-labs/solana/pull/15139/files#diff-798271cb43dca47253c06d8ab6888d8bf4fb385fc8e74cc2cd70ac1b3b664ebfR136 and nothing will be recycled

@ryoqun
Copy link
Member Author

ryoqun commented Feb 14, 2021

@ryoqun the reason this may not be recycling is because https://github.com/solana-labs/solana/pull/15139/files#diff-798271cb43dca47253c06d8ab6888d8bf4fb385fc8e74cc2cd70ac1b3b664ebfR93 checks the oldest element, but if I have N elements, then if:

1. I only need `1` element at a time

2. `N-1` elements are always sitting in the queue

As long as I have N total allocations before the max eviction time passes, then every packet will be refreshed by the recycle logic https://github.com/solana-labs/solana/pull/15139/files#diff-798271cb43dca47253c06d8ab6888d8bf4fb385fc8e74cc2cd70ac1b3b664ebfR136 and nothing will be recycled

@carllin Your understanding is mostly correct. Note that such N total allocations must be peak and at-all-once to void the intended eviction for unused inventry. That's because recyler are basically LIFO. So, recycler's inventory must be lent-over some other places at the same time, to reach to refresh the deepest VecDeque items.. Originally, I thought that it's enough to prevent mem leak due to some spikes. But I looks like my guess was off.

My current theory is that solana-receiver threads creates unbounded number of Packets out of Recycler and puts them in to the mpsc::Channel and solana-gossip couldn't keep with it?

@ryoqun
Copy link
Member Author

ryoqun commented Feb 14, 2021

Also, this bug looks like a classic buffer bloat. ;)

use test::Bencher;

#[bench]
fn bench_recycler(bencher: &mut Bencher) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

// expire if less than 10.
if gc.len() > MAX_INVENTORY_COUNT_WITHOUT_EXPIRATION {
if let Some((oldest_time, _old_item)) = gc.front() {
if oldest_time.elapsed().as_secs() >= EXPIRATION_TTL_SECONDS {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this

@ryoqun
Copy link
Member Author

ryoqun commented Feb 24, 2021

closing in favor of #15320. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants