Skip to content

Commit

Permalink
Fix race condition causing 'ack timeout 4' disconnects (#4)
Browse files Browse the repository at this point in the history
The AsyncClient::send() methods sets a boolean to true after pushing
data over the TCP socket successfully using tcp_output(). It also sets
a timestamp to remember at what time the data was sent.

The AsyncClient::_sent() callback method reacts to ACKs coming from
the connected client. This method sets the boolean to false.

In the AsyncClient::_poll() method, a check is done to see if the
boolean is true ("I'm waiting for an ACK") and if the time at which
the data was sent is too long ago (5000 ms). If this is the case,
a connection issue with the connected client is assumed and the
connection is forcibly closed by the server.

The race condition is when these operations get mixed up, because
of multithreading behavior. The _sent() method can be called during
the execution of the send() method:

1. send() sends out data using tcp_output()
2. _sent() is called because an ACK is processed, sets boolean to false
3. send() continues and sets boolean to true + timestamp to "now"

After this, the data exchange with the client was successful. Data were
sent and the ACK was seen.
However, the boolean ended up as true, making the _poll() method think
that an ACK is still to be expected. As a result, 5000 ms later, the
connection is dropped.

This commit fixes the code by first registering that an ACK is
expected, before calling tcp_output(). This way, there is no race
condition when the ACK is processed right after that call.

Additionally, I changed the boolean to an integer counter value.
The server might send multiple messages to the client, resulting in
multiple expected ACKs. A boolean does not cover this situation.

Co-authored-by: Maurice Makaay <mmakaay1@xs4all.net>
  • Loading branch information
mmakaay and Maurice Makaay authored Apr 11, 2021
1 parent cfecaa3 commit 030b747
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 11 deletions.
21 changes: 11 additions & 10 deletions src/AsyncTCP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -575,7 +575,7 @@ AsyncClient::AsyncClient(tcp_pcb* pcb)
, _pb_cb_arg(0)
, _timeout_cb(0)
, _timeout_cb_arg(0)
, _pcb_busy(false)
, _pcb_busy(0)
, _pcb_sent_at(0)
, _ack_pcb(true)
, _rx_last_packet(0)
Expand Down Expand Up @@ -783,13 +783,14 @@ size_t AsyncClient::add(const char* data, size_t size, uint8_t apiflags) {
}

bool AsyncClient::send(){
int8_t err = ERR_OK;
err = _tcp_output(_pcb, _closed_slot);
if(err == ERR_OK){
_pcb_busy = true;
_pcb_sent_at = millis();
auto pcb_sent_at_backup = _pcb_sent_at;
_pcb_sent_at = millis();
_pcb_busy++;
if (_tcp_output(_pcb, _closed_slot) == ERR_OK) {
return true;
}
_pcb_sent_at = pcb_sent_at_backup;
_pcb_busy--;
return false;
}

Expand Down Expand Up @@ -869,7 +870,7 @@ int8_t AsyncClient::_connected(void* pcb, int8_t err){
_pcb = reinterpret_cast<tcp_pcb*>(pcb);
if(_pcb){
_rx_last_packet = millis();
_pcb_busy = false;
_pcb_busy = 0;
// tcp_recv(_pcb, &_tcp_recv);
// tcp_sent(_pcb, &_tcp_sent);
// tcp_poll(_pcb, &_tcp_poll, 1);
Expand Down Expand Up @@ -932,7 +933,7 @@ int8_t AsyncClient::_fin(tcp_pcb* pcb, int8_t err) {
int8_t AsyncClient::_sent(tcp_pcb* pcb, uint16_t len) {
_rx_last_packet = millis();
//log_i("%u", len);
_pcb_busy = false;
_pcb_busy--;
if(_sent_cb) {
_sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
}
Expand Down Expand Up @@ -977,8 +978,8 @@ int8_t AsyncClient::_poll(tcp_pcb* pcb){
uint32_t now = millis();

// ACK Timeout
if(_pcb_busy && _ack_timeout && (now - _pcb_sent_at) >= _ack_timeout){
_pcb_busy = false;
if(_pcb_busy > 0 && _ack_timeout && (now - _pcb_sent_at) >= _ack_timeout){
_pcb_busy = 0;
log_w("ack timeout %d", pcb->state);
if(_timeout_cb)
_timeout_cb(_timeout_cb_arg, this, (now - _pcb_sent_at));
Expand Down
2 changes: 1 addition & 1 deletion src/AsyncTCP.h
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ class AsyncClient {
AcConnectHandler _poll_cb;
void* _poll_cb_arg;

bool _pcb_busy;
uint32_t _pcb_busy;
uint32_t _pcb_sent_at;
bool _ack_pcb;
uint32_t _rx_ack_len;
Expand Down

1 comment on commit 030b747

@skandalfo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change to a counter value is an error here.

TCP acks don't need to match 1:1 to segments sent; they are cumulative up to a # of bytes received during the connection's lifetime (in their basic form) or can be range selective (IIRC to improve resend efficiency when only a hole of the sequence of datagrams has been lost).

It's perfectly valid for the other side of the connection to send one single cumulative ACK for a number of received packets as long as it's sent before the timeout period elapses.

Please sign in to comment.