-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(iroh-net): Clear the recent pong time when pong is lost #2743
Conversation
When we do not receive a pong in the timeframe we should have we remove the last ping time and clear the best_addr. However the path will be re-selected from the candidates as long as the pong time is set. This additionally clears the pong time so that there is no longer an indication this path works. While the DISCO ping-pong is not loss-protected there is already an existing check that ensures this clear is not triggered when there are still other signs of life from the remote node. Note that when sending call-me-maybe the pong times are also cleared. So while I'm not a particular fan of this this is the intended way to use this. Ideally the pathstate would be clearer about this data and store this as separately. But the PathState data needs a general cleanup and I do not want to mix in extra state for now.
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2743/docs/iroh/ Last updated: 2024-09-23T15:23:47Z |
@flub Thank you for resolving this issue so quickly. I was able to reproduce the problem using the old version where, after stopping the server and restarting it after a while, the client could no longer connect to the server. After switching to the current branch for testing, regardless of when the server is restarted, the client can successfully connect to the server within a few to a dozen seconds. Both local area network (LAN) and wide area network (WAN) tests have shown that this fix has resolved my issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, IIUC, the recent_pong
setting interacts with assign_best_addr_from_candidates_if_empty
. So previously it wasn't set to None
, which meant that clearing the best_addr
didn't have the desired effect, since it was re-set to a bad address in assign_best_addr_from_candidates_if_empty
, right?
correct |
late to the party I guess. Docs for /// Last [`PongReply`] received. which is now factually wrong. The field's naming is good enough to allow for this, but docs should get adjusted accordingly |
Fair, but staring at this to try and improve I have not the faintest idea on how to document this right now. The best I can do is look at the callsites and describe them. But like...??
But that's pretty rubbish. I could make a PR with it though if you like. What probably should happen is that the PathState cleanup that I wanted to start sometime should continue and instead of what "seleect_from_candidates" does now it should use some form of "does_this_path_work" function (but there are already several that are all weird and wrong in some ways). But that is also a larger project and I feel more like pushing multipath forward at the moment before deciding which bits of PathState need to be cleaned up. Sorry, excuses. I know... |
I understand and share the frustration. I'm just unhappy how we are still haunted by go code we didn't design and thus it seems we won't ever fully understand. Rant aside, to reduce the ...awkwardness.. maybe we can just say /// The most recent [`PongReply`] considered relevant for this path.
/// Previous replies might be cleared when they can't be used to determine the
/// aliveness of this path, or are generally considered no longer relevant. Basically I'm still leaning on being somewhat vague so that anyone reading this code leaves at least with "I don't fully understand this field, I'd rather have some care using it" vs "I fully understand this field" with an outright wrong idea of its use. |
How about /// The most recent [`PongReply`], if more recent than [`PING_TIMEOUT_DURATION`] ? |
Description
When we do not receive a pong in the timeframe we should have, we remove the last ping time and clear the best_addr. However the path will be re-selected from the candidates as long as the pong time is set.
This additionally clears the pong time so that there is no longer an indication this path works. While the DISCO ping-pong is not loss-protected there is already an existing check that ensures this clear is not triggered when there are still other signs of life from the remote node.
Breaking Changes
None
Notes & open questions
Note that when sending call-me-maybe the pong times are also cleared. So while I'm not a particular fan of this, it is the intended way to use this. Ideally the PathState would be clearer about this data and store this as separately. But the PathState data needs a general cleanup and I do not want to mix in extra state for now.
Fixes a bug found by @zh522130
Change checklist
[ ] Documentation updates following the style guide, if relevant.[ ] Tests if relevant.[ ] All breaking changes documented.