Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (SYN-504) #1404

Closed
matrixbot opened this issue Oct 22, 2015 · 6 comments
Labels
z-bug (Deprecated Label)

Comments

@matrixbot
Copy link
Member

Submitted by @​matthew:matrix.org
We're trying to connect every ~10s to dead servers despite destinations.retry_interval being 3600000ms (1h). Also, each failed request logs the connection failure 12 times...

(Imported from https://matrix.org/jira/browse/SYN-504)

@matrixbot
Copy link
Member Author

Jira watchers: @erikjohnston @ara4n

@matrixbot
Copy link
Member Author

matrixbot commented Oct 22, 2015

Links exported from Jira:

relates to #1463

@matrixbot
Copy link
Member Author

It seems we (still) have a really serious regression on federation retries
arasphere is trying to hammer dead homeservers for every event it emits

sqlite> select * from destinations where destination='tyler.cat';
tyler.cat|1452137060029|15000000

implies to me that it should be trying tyler.cat every 4 hours? (15 million milliseconds)
but i'm seeing every event I emit into matrix hq from arasphere causing at least 5 retries over federation to that server:

2016-01-07 03:16:24,131 - synapse.http.matrixfederationclient - 183 - WARNING - GET-31539 - {GET-O-371} Sending request failed to tyler.cat: GET matrix://tyler.cat/_matrix/media/v1/download/tyler.cat/LPaWtHDwcLyglmqQlsDKBySP: ConnectionRefusedError - ConnectionRefusedError: Connection refused
2016-01-07 03:17:02,587 - synapse.http.matrixfederationclient - 183 - WARNING - - {PUT-O-3794} Sending request failed to tyler.cat: PUT matrix://tyler.cat/_matrix/federation/v1/send/1451931591285/: ConnectionRefusedError - ConnectionRefusedError: Connection refused
ConnectionRefusedError - ConnectionRefusedError: Connection refused
2016-01-07 03:24:20,340 - synapse.http.matrixfederationclient - 183 - WARNING - - {PUT-O-4600} Sending request failed to tyler.cat: PUT matrix://tyler.cat/_matrix/federation/v1/send/1451931592091/: ConnectionRefusedError - ConnectionRefusedError: Connection refused

-- @ara4n

@matrixbot
Copy link
Member Author

Hmm, it seems to be working correctly on jki.re

-- @erikjohnston

@matrixbot
Copy link
Member Author

After restarting arasphere.net to turn on manhole, the retry time for tyler.cat seems to have been reset and is now incrementing correctly.

No idea what's going on, will continue to monitor.

-- @erikjohnston

@matrixbot matrixbot added p1 z-bug (Deprecated Label) labels Nov 7, 2016
@matrixbot matrixbot changed the title Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (SYN-504) Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (https://github.com/matrix-org/synapse/issues/1404) Nov 7, 2016
@matrixbot matrixbot changed the title Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (https://github.com/matrix-org/synapse/issues/1404) Federation no longer upholds retry_interval for exponential backoff when talking to dead servers (SYN-504) Nov 7, 2016
@richvdh
Copy link
Member

richvdh commented Mar 22, 2017

I think we should assume, that modulo specific cases such as #1737, this has gone away.

Worth noting that the limiter is applied more-or-less per api call, rather than in the federation http layer, so it's entirely possible for some retries to be happening even though a server is considered "dead" in general.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

2 participants