-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block on Proof Courier Service Connection Attempt #1203
base: main
Are you sure you want to change the base?
Conversation
This commit ensures that the proof transfer ChainPorter state is re-executed once proof transfer backoff attempts have been exhausted. In the absence of this commit, the next opportunity for re-attempting proof transfer would be when tapd restarts (pending parcels are processed on startup).
Pull Request Test Coverage Report for Build 11953640765Details
💛 - Coveralls |
e0aa341
to
c3e2b05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the logic make sense, thanks for the added robustness!
Some questions around some of the chosen values, since the integration tests take quite a while longer now.
The change ensures that the courier service connection attempt is blocking rather than synchronous. This prevents proof transfers from failing due to attempts to use connections before they are fully established, simplifying debugging. Both the connection and transfer steps are part of the backoff procedure, so failures in either step will trigger re-attempts.
This commit adds a new default value for the proof courier service response timeout which was added in the previous commit.
Set the request timeout for the tapd harness universe courier service to an appropriate value to ensure tests pass consistently.
c3e2b05
to
771864d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM 🎉
Current CI failures are known flakes or fixed in another PR (LiT itest).
IIRC, the issue last time was that an address contained an invalid courier addr, so the proof could never be delivered, meaning the send was never completed. Does this change resolve that? IIUC now, we'll just block when trying to connect, but after the send has already been confirmed on chain? |
@Roasbeef The PR does not address the invalid courier address problem. This PR ensures that we don't try to use a proof courier service connection before it's ready and so potentially avoid an unnecessary backoff and log messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good! Just one note.
Also, for the third commit, the comment should say async
not sync
?
// ServiceRequestTimeout defines the maximum duration we'll wait for | ||
// a courier service to handle our outgoing request during a connection | ||
// attempt, or when delivering or retrieving a proof. | ||
ServiceRequestTimeout time.Duration `long:"servicerequestimeout" description:"The maximum duration we'll wait for a courier service to handle our outgoing request during a connection attempt, or when delivering or retrieving a proof."` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the hashmail courier / every courier implementation have a similar timeout? And then use it on initialization.
For hashmail I think that would be in NewHashMailBox
, you could add the same child ctx
object as you're doing for the Universe courier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right! Nice catch. Hahsmail is blocking also now, so would be good to have that timeout.
@ffranr, remember to re-request review from reviewers when ready |
!lightninglabs-deploy mute 720h30m |
This PR enhances the reliability of the proof courier service and the robustness of the proof transfer process:
These updates strengthen the courier service's reliability and make proof transfers more fault-tolerant.