Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (SYN-504) #1404

matrixbot · 2015-10-22T19:07:22Z

Submitted by @matthew:matrix.org
We're trying to connect every ~10s to dead servers despite destinations.retry_interval being 3600000ms (1h). Also, each failed request logs the connection failure 12 times...

(Imported from https://matrix.org/jira/browse/SYN-504)

matrixbot · 2015-10-22T19:07:22Z

Jira watchers: @erikjohnston @ara4n

matrixbot · 2015-10-22T19:07:22Z

Links exported from Jira:

relates to #1463

matrixbot · 2016-01-07T03:32:03Z

It seems we (still) have a really serious regression on federation retries
arasphere is trying to hammer dead homeservers for every event it emits

sqlite> select * from destinations where destination='tyler.cat';
tyler.cat|1452137060029|15000000

implies to me that it should be trying tyler.cat every 4 hours? (15 million milliseconds)
but i'm seeing every event I emit into matrix hq from arasphere causing at least 5 retries over federation to that server:

2016-01-07 03:16:24,131 - synapse.http.matrixfederationclient - 183 - WARNING - GET-31539 - {GET-O-371} Sending request failed to tyler.cat: GET matrix://tyler.cat/_matrix/media/v1/download/tyler.cat/LPaWtHDwcLyglmqQlsDKBySP: ConnectionRefusedError - ConnectionRefusedError: Connection refused
2016-01-07 03:17:02,587 - synapse.http.matrixfederationclient - 183 - WARNING - - {PUT-O-3794} Sending request failed to tyler.cat: PUT matrix://tyler.cat/_matrix/federation/v1/send/1451931591285/: ConnectionRefusedError - ConnectionRefusedError: Connection refused
ConnectionRefusedError - ConnectionRefusedError: Connection refused
2016-01-07 03:24:20,340 - synapse.http.matrixfederationclient - 183 - WARNING - - {PUT-O-4600} Sending request failed to tyler.cat: PUT matrix://tyler.cat/_matrix/federation/v1/send/1451931592091/: ConnectionRefusedError - ConnectionRefusedError: Connection refused

-- @ara4n

matrixbot · 2016-01-07T11:14:53Z

Hmm, it seems to be working correctly on jki.re

-- @erikjohnston

matrixbot · 2016-01-07T15:18:00Z

After restarting arasphere.net to turn on manhole, the retry time for tyler.cat seems to have been reset and is now incrementing correctly.

No idea what's going on, will continue to monitor.

-- @erikjohnston

richvdh · 2017-03-22T19:43:58Z

I think we should assume, that modulo specific cases such as #1737, this has gone away.

Worth noting that the limiter is applied more-or-less per api call, rather than in the federation http layer, so it's entirely possible for some retries to be happening even though a server is considered "dead" in general.

matrixbot added p1 z-bug (Deprecated Label) labels Nov 7, 2016

matrixbot mentioned this issue Nov 7, 2016

50x responses from HSes via federation don't seem to increase the retry timer (SYN-578) #1463

Closed

ara4n mentioned this issue Jan 18, 2017

Federation target which fail due to non-HTTP application layer errors are not marked as unavailable #1737

Closed

richvdh closed this as completed Mar 22, 2017

This was referenced Jun 10, 2019

Failures to download signing keys don't follow a sensible retry schedule? #5413

Closed

Retry schedule inexplicably resets on down hosts #5414

Closed

Is Synapse backing off on everything it should? #5406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (SYN-504) #1404

Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (SYN-504) #1404

matrixbot commented Oct 22, 2015

matrixbot commented Oct 22, 2015

matrixbot commented Oct 22, 2015 •

edited

Loading

matrixbot commented Jan 7, 2016

matrixbot commented Jan 7, 2016

matrixbot commented Jan 7, 2016

richvdh commented Mar 22, 2017

Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (SYN-504) #1404

Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (SYN-504) #1404

Comments

matrixbot commented Oct 22, 2015

matrixbot commented Oct 22, 2015

matrixbot commented Oct 22, 2015 • edited Loading

matrixbot commented Jan 7, 2016

matrixbot commented Jan 7, 2016

matrixbot commented Jan 7, 2016

richvdh commented Mar 22, 2017

matrixbot commented Oct 22, 2015 •

edited

Loading