chat: obtain better dm ordering guarantees #3883
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ames guarantees message ordering only within the same flow. Using a different duct stack creates a new flow. The wire used by the agent shows up in the duct stack for the messages it sends.
Previously, we were putting DM message identifiers into the poke wire, resulting in a unique flow for each message. We need this because when we eventually get a (n)ack for the poke, we may need to correlate that with the sent message, to handle failure cases gracefully (and potentially to track delivery status if we ever want it).
Recently, ~palfun has been under load, spending lots of time processing whatever, and dropping many packets in the process. One of the symptoms this resulted in was receiving DMs out of order, sometimes with egregious delays between them. My working theory here is that, because every flow has its own retry timer, and delivery rates were low, "later" flows might retry and/or deliver their messages before "earlier" flows.
To regain the ordering guarantees ames is able to give us, we now use a static wire for DM message sending, and lean on the ordering guarantee that gives us by keeping a queue of outgoing messages that are pending delivery.
This change only applies to ship-to-ship DMs. Channels already uses a static wire (+ca-send-command), so should be unaffected by this issue. Group DMs also put ids into the wire (+act:cu-pass), and may benefit from a similar patch, but also has a more convoluted networking model to account for. @arthyn we may want to sit down and do a quick pass over the codebase for similar issues.
Tested and confirmed locally that before this change, every message created a new flow, and that after the change this is no longer the case. Have hit both happy and nack codepaths, and the latter retries like it used to. Will do some more extensive testing to try and hit the exact case that bit ~palfun, but should be ready for review already.