-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEX - sharing peers with othes #261
base: main
Are you sure you want to change the base?
Conversation
Actually I see same problems also on #!/usr/bin/env bash
# end on sigint
STOP=""
trap "STOP=1" INT
NUM_RUNS=${1:-100}
TOTAL=$NUM_RUNS
FAILURES=0
export NO_COLOR=1
rm /tmp/librqbit_e2e_failures.log
while [ $NUM_RUNS -gt 0 ]; do
echo Running test $((TOTAL - NUM_RUNS + 1)) of $TOTAL
NUM_RUNS=$((NUM_RUNS - 1))
cargo test --package librqbit tests::e2e::test_e2e_download > /tmp/librqbit_e2e_run.log
res=$?
if [ $res -ne 0 ]; then
FAILURES=$((FAILURES + 1))
echo "FAILURE with code $res" >> /tmp/librqbit_e2e_failures.log
tail -50 /tmp/librqbit_e2e_run.log >> /tmp/librqbit_e2e_failures.log
echo "----------------------------------------------" >> /tmp/librqbit_e2e_failures.log
fi
if [ "$STOP" = "1" ]; then
break
fi
done
rm /tmp/librqbit_e2e_run.log
echo "Total runs : $TOTAL, failures : $FAILURES"
echo "Failures log /tmp/librqbit_e2e_failures.log" On main branch I got 28 failures out of 100 - mainly above mentioned 0 tasks, but couple of dropcheck failures. |
Maybe some tokio upgrade broke it, I'll take a look someday. Just got back from a 10 day break, so catching up. |
let mut sent_peers_live: HashSet<SocketAddr> = HashSet::new(); | ||
const MAX_SENT_PEERS: usize = 50; // As per BEP 11 we should not send more than 50 peers at once (here it also applies to fist message, should be OK as we anyhow really have more) | ||
const PEX_MESSAGE_INTERVAL: Duration = Duration::from_secs(60); // As per BEP 11 recommended interval is min 60 seconds | ||
let mut delay = Duration::from_secs(10); // Wait 10 seconds before sending the first message to assure that peer will stay with us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokio::time::interval would be cleaner as it serves the same purpose as sleeping in a loop but is a bit nicer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main reasons for using sleep
:
- initial wait - do not want to start immediately after peer connection, as peer can quickly disconnect. Only want to send PEX to 'stable' peers. interval will fire immediately.
- changing wait period - now it's 10, then 60 secs. But can be extended - I was thinking about some extension of the interval - up to 600 secs or something like this.
From this perspective sleep seems easier to use.
None | ||
} | ||
}) | ||
.take(50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the initial PEX message the combined amount of added v4/v6 contacts should not exceed 50 entries. The same applies to dropped entries.
Afaiu this implies that on the next iteration you'll get the next 50 peers, but in your code you're only looking at first 50 live, and you'll never look at the ones past the first 50.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also condition && !sent_peers_live.contains(addr)
on line 834, which filters out already sent addresses - so what I got are next 50, which have not been sent to peer - I looked on it in swarm on busy torrent (Ubuntu) and it seemed do work.
Anyhow this is also related to the other comment, so it may change.
.filter(|addr| { | ||
self.peers | ||
.states | ||
.get(addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're locking the state twice, and it may race - e.g. it'll be live in first pass where you are computing "addrs_live_to_sent" and then not live later, or the other way around.
It's hard to reason about what will happen in this case, esp. when below you're trying to provide strong guarantees like "it's assured by mutual exclusion of two above sets".
So overall, how about refactor this for simplicity:
- store live peer's view in "sent_peers_live" as you are doing now.
- for each loop iteration
- collect all live addrs into HashSet (not bounded by 50, but bounded by the semaphore anyway, 128 if I remember correctly). Let's name it "all_live"
- connected = (all - sent).take(50)
- dropped = (sent - all).take(50)
- sent = sent + connected - dropped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - but I think it's not the case here because liveness check actually happens only once, depending on content of sent_peers_live
- which reflects already sent peers addresses:
- if address is in
sent_peers_live
, then its effectively checked for liveness only on line 853.map(|p| !p.value().state.is_live())
(for closed peers), because of line 834&& !sent_peers_live.contains(addr)
which excludes it ultimately fromaddrs_live_to_sent
set. - if address is not in
sent_peers_live
- it's checked only foraddrs_live_to_sent
on line 832if peer.state.is_live()
and second check for closed peers is not happening at all, because it's in iteration oversent_peers_live
.
So in any case in each loop run only one liveness check happens, which is relevant for logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main point was that the logic is hard to read, and I proposed a simpler way to write it. All the state that you need to keep in your mind to understand why the liveness check is happening only once might serve as proof of that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that logic is difficult - it's just all about sent_peers_live
, which we need anyhow - if address is not in it, we check if we can add new address, which is live and not in it, if address is in it, we check if we should remove if it's not live. That's it.
Proposed change will work exactly same, I think. Only we will need always to collect all live peers, which is not "needed", we just need those, that are not in sent_peers_live
.
Checked on latest main and no problem, only in one run test complained about |
For PEX only receiving part was implemented, this is an attempt also to send PEX extended message to connected peers.
It's WIP now, there is a problem with E2E test , which is unstable now, often ends with :
which does not make sense to me - how could be number of live tasks in runtime 0?
But there is also another interesting behavior of e2e test if I change initial delay on PEX send to 0 (here https://github.com/izderadicka/rqbit/blob/82914bac27226b1d10157832ee66aeb7af67ca18/crates/librqbit/src/torrent_state/live/mod.rs#L807), then:
not_needed
, but should not initiate connection try, at least this is how I understand the code.