-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Limit the number of EDUs in transactions to 100 as expected by receiver #5138
Conversation
Signed-off-by: Quentin Dufour <quentin@deuxfleurs.fr>
Signed-off-by: Quentin Dufour <quentin@deuxfleurs.fr>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! A couple of thoughts below.
@@ -217,17 +217,21 @@ def _transaction_transmission_loop(self): | |||
pending_edus = [] | |||
|
|||
pending_edus.extend(self._get_rr_edus(force_flush=False)) | |||
pending_edus.extend(device_message_edus[:99]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can safely discard the other device_message_edus
?
pending_edus.extend(self._pop_pending_edus(100 - len(pending_edus))) | ||
# But we should keep one slot free for presence | ||
to_remove = [] | ||
for key, val in self._pending_edus_keyed.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be clearer written as a while loop:
while len(pending_edus) < 100:
_, val = self._pending_edus_keyed.popitem()
pending_edus.append(val)
Thanks for your reply. I will work on it tonight. |
Signed-off-by: Quentin Dufour <quentin@deuxfleurs.fr>
contents, stream_id = yield self._store.get_new_device_msgs_for_remote( | ||
self._destination, last_device_stream_id, to_device_stream_id | ||
last_device_list = self._last_device_list_stream_id | ||
now_stream_id, results = yield self._store.get_devices_by_remote( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that get_devices_by_remote
has a hard coded limit of 20 elements.
So we start with this request to be able to complete with the following one (this is why I swapped get_devices_by_remote
and get_new_device_msgs_for_remote
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. I think it would be helpful to include a comment that says get_devices_by_remote
will return no more than 20 entries.
Signed-off-by: Quentin Dufour <quentin@deuxfleurs.fr>
last_device_stream_id = self._last_device_stream_id | ||
to_device_stream_id = self._store.get_to_device_stream_token() | ||
contents, stream_id = yield self._store.get_new_device_msgs_for_remote( | ||
self._destination, last_device_stream_id, to_device_stream_id, 98 - len(edus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can set a limit to fill the edus
list. I chose 98 instead of 100 as presence
and rr_edus
could both take one slot after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, rather than hardcode 98 here, it would be clearer to make _get_new_device_messages
take a limit
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: the default value for the limit
parameter on get_new_device_msgs_for_remote
is now redundant. I suggest removing the default value for clarity.
I have 4 failing unit tests for now (
I don't know if its due to my modifications. I will check that tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great. Thanks again for your work on it. Suggestion below which I think will fix the test failures, and a few other suggestions for minor cleanups.
pending_edus.extend(device_message_edus) | ||
pending_edus.extend(self._pop_pending_edus(100 - len(pending_edus))) | ||
while len(pending_edus) < 100: | ||
_, val = self._pending_edus_keyed.popitem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From _trial_temp/test.log
when running the tests:
Traceback (most recent call last):
File "/home/rav/work/synapse/synapse/federation/sender/per_destination_queue.py", line 244, in _transaction_transmission_loop
_, val = self._pending_edus_keyed.popitem()
KeyError: 'popitem(): dictionary is empty'
Apologies, this was my fault. I suggest adding a check that _pending_edus_keyed
is not empty to the while
condition.
@@ -248,6 +238,12 @@ def _transaction_transmission_loop(self): | |||
) | |||
) | |||
|
|||
pending_edus.extend(device_message_edus) | |||
pending_edus.extend(self._pop_pending_edus(100 - len(pending_edus))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this 100
number is starting to appear in a lot of places. Could I suggest creating a constant MAX_EDUS_PER_TRANSACTION
in this file?
contents, stream_id = yield self._store.get_new_device_msgs_for_remote( | ||
self._destination, last_device_stream_id, to_device_stream_id | ||
last_device_list = self._last_device_list_stream_id | ||
now_stream_id, results = yield self._store.get_devices_by_remote( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. I think it would be helpful to include a comment that says get_devices_by_remote
will return no more than 20 entries.
last_device_stream_id = self._last_device_stream_id | ||
to_device_stream_id = self._store.get_to_device_stream_token() | ||
contents, stream_id = yield self._store.get_new_device_msgs_for_remote( | ||
self._destination, last_device_stream_id, to_device_stream_id, 98 - len(edus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, rather than hardcode 98 here, it would be clearer to make _get_new_device_messages
take a limit
parameter.
last_device_stream_id = self._last_device_stream_id | ||
to_device_stream_id = self._store.get_to_device_stream_token() | ||
contents, stream_id = yield self._store.get_new_device_msgs_for_remote( | ||
self._destination, last_device_stream_id, to_device_stream_id, 98 - len(edus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: the default value for the limit
parameter on get_new_device_msgs_for_remote
is now redundant. I suggest removing the default value for clarity.
Signed-off-by: Quentin Dufour <quentin@deuxfleurs.fr>
Thanks a lot for your extensive and precise review. Now tests are passing locally.
However I still have a concern, the limit passed to |
well, given we've just hardcoded it to be called with 98, I think it's a valid assumption for now. I think an easy way to sanity-check this would be to add:
|
[I checked the code. I am satisfied that it will never return more than 20 rows, however I did discover #5153. That is a separate problem, though...] |
To be sure, is the PR blocked on your side (still some review to do, internal process, etc.) or on my side (you expect some modifications, like the assert one even if the assumption is valid for now)? |
If you could add the |
…_devices_by_remote Signed-off-by: Quentin Dufour <quentin@deuxfleurs.fr>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* develop: (45 commits) URL preview blacklisting fixes (#5155) Revert 085ae34 Add a DUMMY stage to captcha-only registration flow Make Prometheus snippet less confusing on the metrics collection doc (#4288) Set syslog identifiers in systemd units (#5023) Run Black on the tests again (#5170) Add AllowEncodedSlashes to apache (#5068) remove instructions for jessie installation (#5164) Run `black` on per_destination_queue Limit the number of EDUs in transactions to 100 as expected by receiver (#5138) Fix bogus imports in tests (#5154) add options to require an access_token to GET /profile and /publicRooms on CS API (#5083) Do checks on aliases for incoming m.room.aliases events (#5128) Remove the requirement to authenticate for /admin/server_version. (#5122) Fix spelling in server notices admin API docs (#5142) Fix sample config 0.99.3.2 include disco in deb build target list changelog Debian: we now need libpq-dev. ...
* develop: Revert 085ae34 Add a DUMMY stage to captcha-only registration flow Make Prometheus snippet less confusing on the metrics collection doc (#4288) Set syslog identifiers in systemd units (#5023) Run Black on the tests again (#5170) Add AllowEncodedSlashes to apache (#5068) remove instructions for jessie installation (#5164) Run `black` on per_destination_queue Limit the number of EDUs in transactions to 100 as expected by receiver (#5138)
Goal
I have observed this problem on my server:
The corresponding issue seems to be Synapse is not limiting the number of EDUs in a transaction #3951.
It is a blocker since Reject large transactions on federation #4513 as it blocks the federation between two servers.
In my case, the bug was caused by
device_message_edus
that was containing 103 entries.I don't know synapse internals and just wrote this quick fix to bring back federations. I might have written some wrong/dangerous things.
Pull Request Checklist