New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

PEX - sharing peers with othes #261

Draft

izderadicka wants to merge 6 commits into ikatson:main from izderadicka:pex-send

Contributor

izderadicka commented Oct 23, 2024

For PEX only receiving part was implemented, this is an attempt also to send PEX extended message to connected peers.

It's WIP now, there is a problem with E2E test , which is unstable now, often ends with :

---- tests::e2e::test_e2e_download stdout ----
[crates/librqbit/src/tests/e2e.rs:65:9] tempdir.path() = "/tmp/rqbit_e2eYmBFKD"
thread 'tests::e2e::test_e2e_download' panicked at crates/librqbit/src/tests/test_util.rs:213:6:
called `Result::unwrap()` on an `Err` value: wait_until timeout: last result = Some(metrics.num_alive_tasks() = 0, expected 1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

which does not make sense to me - how could be number of live tasks in runtime 0?

But there is also another interesting behavior of e2e test if I change initial delay on PEX send to 0 (here https://github.com/izderadicka/rqbit/blob/82914bac27226b1d10157832ee66aeb7af67ca18/crates/librqbit/src/torrent_state/live/mod.rs#L807), then:

Sometime some server tasks did not end (5 tasks per server) - as if server session was not dropped - it's kind of arbitrary - message same as above but now number of task is 5 or 10, or n x 5 - depends (max I seen was 80).
Servers send PEX msg between themselves - which is strange - as their torrent is fully downloaded - so they should not open new connections and peers received from client should just be stored as not_needed , but should not initiate connection try, at least this is how I understand the code.

izderadicka added 4 commits

October 21, 2024 13:12


          Sending PEX - initial concept

0529d60


          Fix missing fn

b8d2c2f


          Include only addresses with known working connections

94b4a13


          Logging

82914ba

Contributor Author

izderadicka commented Oct 24, 2024

Actually I see same problems also on main branch - can you try on main

#!/usr/bin/env bash

# end on sigint
STOP=""
trap "STOP=1" INT


NUM_RUNS=${1:-100}
TOTAL=$NUM_RUNS
FAILURES=0
export NO_COLOR=1
rm /tmp/librqbit_e2e_failures.log
while [ $NUM_RUNS -gt 0 ]; do
    echo Running test $((TOTAL - NUM_RUNS + 1)) of $TOTAL
    NUM_RUNS=$((NUM_RUNS - 1))
    cargo test --package librqbit  tests::e2e::test_e2e_download > /tmp/librqbit_e2e_run.log
    res=$?
    if [ $res -ne 0 ]; then
        FAILURES=$((FAILURES + 1)) 
        echo "FAILURE with code $res" >> /tmp/librqbit_e2e_failures.log
        tail -50 /tmp/librqbit_e2e_run.log >> /tmp/librqbit_e2e_failures.log
        echo "----------------------------------------------" >> /tmp/librqbit_e2e_failures.log

    fi
    if [ "$STOP" = "1" ]; then
        break
    fi
done
rm /tmp/librqbit_e2e_run.log
echo "Total runs : $TOTAL, failures : $FAILURES"
echo "Failures log /tmp/librqbit_e2e_failures.log"

On main branch I got 28 failures out of 100 - mainly above mentioned 0 tasks, but couple of dropcheck failures.

Owner

ikatson commented Oct 27, 2024

Maybe some tokio upgrade broke it, I'll take a look someday. Just got back from a 10 day break, so catching up.

ikatson requested changes

View reviewed changes

crates/librqbit/src/torrent_state/live/mod.rs

+                      let mut sent_peers_live: HashSet<SocketAddr> = HashSet::new();
+                      const MAX_SENT_PEERS: usize = 50; // As per BEP 11 we should not send more than 50 peers at once (here it also applies to fist message, should be OK as we anyhow really have more)
+                      const PEX_MESSAGE_INTERVAL: Duration = Duration::from_secs(60); // As per BEP 11 recommended interval is min 60 seconds
+                      let mut delay = Duration::from_secs(10); // Wait 10 seconds before sending the first message to assure that peer will stay with us

Owner

ikatson Nov 5, 2024

tokio::time::interval would be cleaner as it serves the same purpose as sleeping in a loop but is a bit nicer

Contributor Author

izderadicka Nov 9, 2024

Main reasons for using sleep:

initial wait - do not want to start immediately after peer connection, as peer can quickly disconnect. Only want to send PEX to 'stable' peers. interval will fire immediately.
changing wait period - now it's 10, then 60 secs. But can be extended - I was thinking about some extension of the interval - up to 600 secs or something like this.

From this perspective sleep seems easier to use.

crates/librqbit/src/torrent_state/live/mod.rs Outdated Show resolved Hide resolved

crates/librqbit/src/torrent_state/live/mod.rs Outdated

+                                      None
+                                  }
+                              })
+                              .take(50)

Owner

ikatson Nov 5, 2024

Except for the initial PEX message the combined amount of added v4/v6 contacts should not exceed 50 entries. The same applies to dropped entries.

Afaiu this implies that on the next iteration you'll get the next 50 peers, but in your code you're only looking at first 50 live, and you'll never look at the ones past the first 50.

Contributor Author

izderadicka Nov 9, 2024

There is also condition && !sent_peers_live.contains(addr) on line 834, which filters out already sent addresses - so what I got are next 50, which have not been sent to peer - I looked on it in swarm on busy torrent (Ubuntu) and it seemed do work.
Anyhow this is also related to the other comment, so it may change.

crates/librqbit/src/torrent_state/live/mod.rs

+                              .filter(|addr| {
+                                  self.peers
+                                      .states
+                                      .get(addr)

Owner

ikatson Nov 5, 2024

you're locking the state twice, and it may race - e.g. it'll be live in first pass where you are computing "addrs_live_to_sent" and then not live later, or the other way around.

It's hard to reason about what will happen in this case, esp. when below you're trying to provide strong guarantees like "it's assured by mutual exclusion of two above sets".

So overall, how about refactor this for simplicity:

store live peer's view in "sent_peers_live" as you are doing now.
for each loop iteration
1. collect all live addrs into HashSet (not bounded by 50, but bounded by the semaphore anyway, 128 if I remember correctly). Let's name it "all_live"
2. connected = (all - sent).take(50)
3. dropped = (sent - all).take(50)
4. sent = sent + connected - dropped

Contributor Author

izderadicka Nov 9, 2024

I see - but I think it's not the case here because liveness check actually happens only once, depending on content of sent_peers_live - which reflects already sent peers addresses:

if address is in sent_peers_live, then its effectively checked for liveness only on line 853 .map(|p| !p.value().state.is_live()) (for closed peers), because of line 834 && !sent_peers_live.contains(addr) which excludes it ultimately from addrs_live_to_sent set.
if address is not in sent_peers_live - it's checked only for addrs_live_to_sent on line 832 if peer.state.is_live() and second check for closed peers is not happening at all, because it's in iteration over sent_peers_live.

So in any case in each loop run only one liveness check happens, which is relevant for logic.

Owner

ikatson Nov 9, 2024

My main point was that the logic is hard to read, and I proposed a simpler way to write it. All the state that you need to keep in your mind to understand why the liveness check is happening only once might serve as proof of that :)

Contributor Author

izderadicka Nov 9, 2024

I do not think that logic is difficult - it's just all about sent_peers_live, which we need anyhow - if address is not in it, we check if we can add new address, which is live and not in it, if address is in it, we check if we should remove if it's not live. That's it.

Proposed change will work exactly same, I think. Only we will need always to collect all live peers, which is not "needed", we just need those, that are not in sent_peers_live.

crates/librqbit/src/torrent_state/live/mod.rs Outdated Show resolved Hide resolved

Owner

ikatson commented Nov 6, 2024

As for flaky tests, it seems stable now on my box (without extensive testing). The fixes are here:
e371522
87e09a6

Can you check if it's still bad for u?


          Merge branch 'main' into pex-send

051ada4

Contributor Author

izderadicka commented Nov 9, 2024

As for flaky tests, it seems stable now on my box (without extensive testing). The fixes are here: e371522 87e09a6

Can you check if it's still bad for u?

Checked on latest main and no problem, only in one run test complained about Blocking waiting for file lock on package cache 3 times, but test succeed then - but at that time my notebook was heavily loaded but these tests and some other work, so I guess that was the reason.


          Small fixes base on comments in PR ikatson#261

ff1959b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet