C2S/csi #3880

NelsonVides · 2022-11-29T22:09:50Z

Important in this PR took many changes to many modules, so some explanation below:

c2s

Route pipeline processing

In the old implementation, when receiving a packet through routing, c2s would, in strict order:

verify for sid conflicts
if successful, run the user_receive_packet hook
maybe buffer CSI
if allowed, maybe buffer stream management
send in the socket
run xmpp_send_element with the result of the send

So, steps 1, 3 and 4 are the ones that are not XMPP core related and need to be taken out of c2s. So now I've reordered as follows:

run the user_receive_packet hook (and the more granular user_receive_* ones) <- stream management is an early subscriber for the sid conflict verification.
if allowed, run a new xmpp_presend_element <- here CSI and stream_mgmt subscribe and can stop the actual socket delivery
if allowed, send in the socket
run xmpp_send_element with the result of the send

This, very importantly, solves a bug in the current implementation of stream_mgmt: if stream_mgmt subscribes to user_receive_packet, it will buffer stanzas that are made to be handled by the server and not delivered to the socket, like IQs addressed to a specific resource, for example those from mod_last.

Thus, we also introduce a new kind of internal event, handle_flush, which will run the received-packet logic from step 2, skipping step one. This is necessary for CSI, when packets are being flushed, to not go through the user_receive_* handlers and have modules like inbox processing them twice. The event is internal to avoid third processes sending erlang messages that would run an incomplete pipeline.

State machine states

Before, c2s states where defined with strict names, where other modules, like stream management (and in the future mod_websockets), can take over the state machine and inject their own states, but still reuse functionality available from mongoose_c2s. In that case, the types defined for the state machine wouldn't match anymore. So we add a new state to the c2s_state() type, called {external_state, term()}, to identify states defined by other modules, which will in turn know what to do with them, separately from c2s. This was a suggestion from @kamilwaz 😉

Stream management

So as described above, stream management now subscribes to user_receive_packet only for sid conflict verification. Then, it subscribes to xmpp_presend_element to do the buffering.

Another optimisation, is the fact that on resumption, rerouting lost messages is strictly meant to be delivered to the process that resumed the session, with no new filtering applied. So we can bypass the complex routing logic, and directly send the erlang messages to the resuming process. So mongoose_c2s implements a new reroute_buffer_to_peer/3 helper, which takes a pid and sends all the messages.

Also a flaky test (resume_session_with_wrong_h_does_not_leak_sessions) was fixed by introducing a wait, as smids are removed by requesting so to mnesia, asynchonously, so again we can have a race condition between mnesia and the verification.

CSI

First of all one more test was introduced and many other were fixed, for details redirect to commits 8743faf and 8e21d57. For the implementation, see 881d827.

codecov · 2022-11-29T22:18:39Z

Codecov Report

Base: 73.11% // Head: 73.24% // Increases project coverage by +0.13% 🎉

Coverage data is based on head (a356ac5) compared to base (8756003).
Patch coverage: 92.97% of modified lines in pull request are covered.

Additional details and impacted files

@@                   Coverage Diff                    @@
##           feature/mongoose_c2s    #3880      +/-   ##
========================================================
+ Coverage                 73.11%   73.24%   +0.13%     
========================================================
  Files                       540      540              
  Lines                     34110    34187      +77     
========================================================
+ Hits                      24939    25041     +102     
+ Misses                     9171     9146      -25

Impacted Files	Coverage Δ
src/c2s/mongoose_c2s_acc.erl	`84.78% <58.33%> (-15.22%)`	⬇️
src/c2s/mongoose_c2s.erl	`86.75% <90.00%> (+1.27%)`	⬆️
src/stream_management/mod_stream_management.erl	`88.05% <96.82%> (-0.63%)`	⬇️
src/mod_csi.erl	`98.36% <98.21%> (+90.02%)`	⬆️
src/c2s/mongoose_c2s_hooks.erl	`95.45% <100.00%> (+4.97%)`	⬆️
src/ejabberd_sm.erl	`83.76% <100.00%> (+0.27%)`	⬆️
src/mod_websockets.erl	`79.62% <0.00%> (-1.24%)`	⬇️
src/mod_bosh_socket.erl	`75.29% <0.00%> (-0.60%)`	⬇️
... and 13 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

kamilwaz

Good job 👍, just got one small comment about the style.

src/c2s/mongoose_c2s.erl

This is an important part of the rework. The point is that now a session is ready to receive messages directly addressed to it immediately after binding a resource, so afterwards a request to enable stream management can have interlapping messages between the request and the response. And this is fine, as XEP-0198 requires: - _sending_ count to start after _sending_ the <enable> or <enabled> - _receiving_ count to start after _receiving_ the <enable> or <enabled> So the client will set its counter for received messages after receiving the <enabled>. So the question is then whether the order in which the stanzas are received is relevant: - If the client receives first the stream management <enabled> and then the rerouted stanzas, that means that the c2s received the <enable> and answered with <enabled> before receiving the rerouted stanzas, and therefore counted them for the sending counter. - If the client receives first the messages from the reroute and then the <enabled> stanza, that means that the c2s received first all the messages and delivered, before receiving the <enable> and initialising its counters, as the client will have its received stanzas at zero (starting when it received the <enabled>), the server will also have its sent stanzas at zero (starting when it sent the <enabled>). So in all cases the counts match.

Many tests have Alice sending the inactive stanza, immediately followed by Bob sending many messages. But here lies a race condition: a user message is implemented as the test-case Pid sending Erlang messages to the user's socket-managing Pids, which in turn send the actual TCP messages. But while the order in which the messages sent from the test-case Pid is ensured, the order in which the socket-managing Pids will be executed is not, and therefore it is possible that the first message from Bob will be delivered _before_ Alice's inactive request. So after sending an inactive request, we need to introduce a small wait, followed by a verification that Alice has not received any message. Still, a wait is not bulletproof, but it solves the problem sufficiently often. The issue is that verifying that Alice has processed the inactive request is non-trivial, as CSI describes no answer to it.

In the original implementation, stream_mgmt first checks on the very beginning of the route, if there is a sid conflict. Then _all_ the `user_receive_packet` hooks are executed, and only after that, buffering for CSI and stream_mgmt happens. But in the current implementation we've partitioned `user_receive_packet` into all four types of stanzas, which could potentially stop the handling, when stream_mgmt has already buffered the payload, therefore we're having a bug. So we add a "pre-send" event hook, where stream management will do the buffering, and a very early stream_mgmt `user_receive_packet` will do the sid conflict check. We also add a new kind of c2s statem event: {flush, Acc}. This will run handle_flush, which is the second part to handle_route, meaning, after all the `user_receive_*` handlers.

kamilwaz

Very good job 👍. I left some comments.

kamilwaz · 2023-01-02T11:01:33Z

src/c2s/mongoose_c2s.erl

@@ -172,13 +176,13 @@ handle_event(state_timeout, state_timeout_termination, _FsmState, StateData) ->
 handle_event(EventType, EventContent, C2SState, StateData) ->
    handle_foreign_event(StateData, C2SState, EventType, EventContent).

-spec terminate(term(), c2s_state(), c2s_data()) -> term().
+-spec terminate(term(), c2s_state(term()), c2s_data()) -> term().


I see that we still keep c2s_state() in the most specs which may be misleading as CS2State can include {external_state, ...} as well. I think it would be nicer to have it defined like this:

-type c2s_state(State) :: connect % ... | {external_state, State}. -type c2s_state() :: c2s_state(term()).

I agree, let's revert this line.

kamilwaz · 2023-01-02T11:20:18Z

src/c2s/mongoose_c2s.erl

+
+-spec handle_flush(c2s_data(), c2s_state(), mongoose_acc:t()) -> fsm_res().
+handle_flush(StateData = #c2s_data{host_type = HostType}, C2SState, Acc) ->
+    HookParams = hook_arg(StateData, C2SState, info, Acc, flust),


Suggested change

HookParams = hook_arg(StateData, C2SState, info, Acc, flust),

HookParams = hook_arg(StateData, C2SState, info, Acc, flush),

Or we can even skip the last argument as there is no any specific reason to be provided:

Suggested change

HookParams = hook_arg(StateData, C2SState, info, Acc, flust),

HookParams = hook_arg(StateData, C2SState, info, Acc, undefined),

I'd leave it (typo corrected) just because it doesn't hurt and it might be used any other time in the future, dunno 🤔

kamilwaz · 2023-01-02T11:44:39Z

src/c2s/mongoose_c2s.erl

-send_element(StateData = #c2s_data{host_type = HostType}, Els, Acc) when is_list(Els) ->
-    Res = send_xml(StateData, Els),
+do_send_element(StateData = #c2s_data{host_type = HostType}, #xmlel{} = El, Acc) ->
+    Res = send_xml(StateData, El),


Currently, we send all elements in one batch. After this one, every new element would be sent separately. I remember that we had discussion about this and we agreed that sending in one batch is preferred way.

I'm wondering if we should keep this code as before with some adjustments:

send_element(StateData = #c2s_data{host_type = HostType}, Els, Acc) when is_list(Els) -> Res = send_xml(StateData, Els), Acc1 = mongoose_acc:set(c2s, send_result, Res, Acc), lists:foldl(fun(El, Acc2) -> mongoose_hooks:xmpp_send_element(HostType, Acc2, El) end, Acc1, Els];

By this, we assume that send_xml/2 has a binary result - all elements are sent or none.

Yeah, about this, I remember we talked about it. But now I'm not that sure. The point was that calls to exml:to_iolist/1 and the TCP socket send, are theoretically faster if batching, and that's why I wanted to optimise for them. But on the other hand, it seems like handlers like csi, amp, metrics, and stream-mgmt, might want to deal with the granularity of single elements, and it seems to complicate code too much to have sometimes to deal with lists and sometimes with single elements, of sometimes #xmlel{} and other times mongoose_acc:t(). So the change here was to have everything dealing with single elements of mongoose_acc:t(), and then the mongoose_c2s_acc helper transform this list of mongoose_acc:t() to a list of gen_statem actions, so that all code in c2s handles only one scenario.

So the old adage applies: first make it work, then make it beautiful, and only then if you really have to, make it fast. At the beginning I could force fast and pretty because I knew of few scenarios yet, but now with this thing becoming more complex... perhaps the minute optimisations can wait 🤷🏽

kamilwaz · 2023-01-02T11:58:59Z

src/c2s/mongoose_c2s_acc.erl

+extract_flushes(Params = #{flush := Accs}) ->
+    WithoutStop = maps:remove(flush, Params),
+    NewAction = [{next_event, internal, {flush, Acc}} || Acc <- Accs ],
+    Fun = fun(Actions) -> NewAction ++ Actions end,
+    maps:update_with(actions, Fun, NewAction, WithoutStop);


Suggested change

extract_flushes(Params = #{flush := Accs}) ->

WithoutStop = maps:remove(flush, Params),

NewAction = [{next_event, internal, {flush, Acc}} || Acc <- Accs ],

Fun = fun(Actions) -> NewAction ++ Actions end,

maps:update_with(actions, Fun, NewAction, WithoutStop);

extract_flushes(Params = #{flush := Accs}) ->

WithoutFlush = maps:remove(flush, Params),

NewAction = [{next_event, internal, {flush, Acc}} || Acc <- Accs],

Fun = fun(Actions) -> NewAction ++ Actions end,

maps:update_with(actions, Fun, NewAction, WithoutFlush);

Or with maps:take/2:

Suggested change

extract_flushes(Params = #{flush := Accs}) ->

WithoutStop = maps:remove(flush, Params),

NewAction = [{next_event, internal, {flush, Acc}} || Acc <- Accs ],

Fun = fun(Actions) -> NewAction ++ Actions end,

maps:update_with(actions, Fun, NewAction, WithoutStop);

extract_flushes(Params) ->

case maps:take(flush, Params) of

{Accs, Params1} ->

NewAction = [{next_event, internal, {flush, Acc}} || Acc <- Accs],

Fun = fun(Actions) -> NewAction ++ Actions end,

maps:update_with(actions, Fun, NewAction, Params2);

error ->

Params

end.

maps:take/2 is defined as a BIF function so I assume it's fast.

Very good idea. Only it is a few more lines of code so I'd leave the first version, but I didn't think about maps:take/2 before 😃

chrzaszcz

It looks good to me, I added a few minor comments.

chrzaszcz · 2023-01-03T13:56:01Z

src/c2s/mongoose_c2s.erl

@@ -172,13 +176,13 @@ handle_event(state_timeout, state_timeout_termination, _FsmState, StateData) ->
 handle_event(EventType, EventContent, C2SState, StateData) ->
    handle_foreign_event(StateData, C2SState, EventType, EventContent).

-spec terminate(term(), c2s_state(), c2s_data()) -> term().
+-spec terminate(term(), c2s_state(term()), c2s_data()) -> term().


I agree, let's revert this line.

src/c2s/mongoose_c2s.erl

src/c2s/mongoose_c2s_acc.erl

src/mod_csi.erl

src/c2s/mongoose_c2s.erl

mongoose-im · 2023-01-04T11:30:52Z

small_tests_24 / small_tests / a356ac5
Reports root / small

small_tests_25 / small_tests / a356ac5
Reports root / small

ldap_mnesia_24 / ldap_mnesia / a356ac5
Reports root/ big
OK: 1745 / Failed: 0 / User-skipped: 797 / Auto-skipped: 0

dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / a356ac5
Reports root/ big
OK: 3879 / Failed: 0 / User-skipped: 78 / Auto-skipped: 0

ldap_mnesia_25 / ldap_mnesia / a356ac5
Reports root/ big
OK: 1745 / Failed: 0 / User-skipped: 797 / Auto-skipped: 0

pgsql_mnesia_24 / pgsql_mnesia / a356ac5
Reports root/ big
OK: 4055 / Failed: 0 / User-skipped: 89 / Auto-skipped: 0

internal_mnesia_25 / internal_mnesia / a356ac5
Reports root/ big
OK: 1897 / Failed: 0 / User-skipped: 645 / Auto-skipped: 0

elasticsearch_and_cassandra_25 / elasticsearch_and_cassandra_mnesia / a356ac5
Reports root/ big
OK: 2249 / Failed: 0 / User-skipped: 632 / Auto-skipped: 0

riak_mnesia_24 / riak_mnesia / a356ac5
Reports root/ big
OK: 2087 / Failed: 0 / User-skipped: 624 / Auto-skipped: 0

dynamic_domains_mysql_redis_25 / mysql_redis / a356ac5
Reports root/ big
OK: 3853 / Failed: 0 / User-skipped: 104 / Auto-skipped: 0

dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / a356ac5
Reports root/ big
OK: 3879 / Failed: 0 / User-skipped: 78 / Auto-skipped: 0

dynamic_domains_mssql_mnesia_25 / odbc_mssql_mnesia / a356ac5
Reports root/ big
OK: 3879 / Failed: 0 / User-skipped: 78 / Auto-skipped: 0

pgsql_mnesia_25 / pgsql_mnesia / a356ac5
Reports root/ big
OK: 4055 / Failed: 0 / User-skipped: 89 / Auto-skipped: 0

mssql_mnesia_25 / odbc_mssql_mnesia / a356ac5
Reports root/ big
OK: 4055 / Failed: 0 / User-skipped: 89 / Auto-skipped: 0

mysql_redis_25 / mysql_redis / a356ac5
Reports root/ big
OK: 4041 / Failed: 0 / User-skipped: 103 / Auto-skipped: 0

chrzaszcz

Looks good 👍

NelsonVides added the mongoose_c2s label Nov 29, 2022

This comment was marked as outdated.

Sign in to view

NelsonVides force-pushed the c2s/csi branch from fdc3913 to b8794b8 Compare November 30, 2022 17:12

This comment was marked as outdated.

Sign in to view

NelsonVides marked this pull request as ready for review November 30, 2022 17:37

kamilwaz requested changes Dec 2, 2022

View reviewed changes

src/c2s/mongoose_c2s.erl Outdated Show resolved Hide resolved

NelsonVides force-pushed the c2s/csi branch from b8794b8 to 4cf6891 Compare December 2, 2022 12:24

This comment was marked as outdated.

Sign in to view

NelsonVides force-pushed the c2s/csi branch from 4cf6891 to 528566a Compare December 2, 2022 13:00

This comment was marked as outdated.

Sign in to view

NelsonVides added 2 commits December 22, 2022 19:28

Enable csi tests

20bc77e

Cleanup CSI test suite

810e6c3

NelsonVides force-pushed the c2s/csi branch from 528566a to 8ae8c11 Compare December 23, 2022 15:24

NelsonVides marked this pull request as draft December 23, 2022 15:24

NelsonVides force-pushed the c2s/csi branch from 8ae8c11 to fa2ee18 Compare December 23, 2022 15:53

This comment was marked as outdated.

Sign in to view

NelsonVides added 13 commits December 27, 2022 10:43

Test that inactive twice does not reset CSI buffer

fdb16ef

Use hook state functionality on stop as well

a71af4b

Consider new data when hard_stopping after a hook

6aaf0c4

Put routing and rerouting a buffer in the c2s module

1aa04b7

stream management can stop after handling its unique namespace

963ec26

Fix flaky sm test

d265908

Reuse prepared return type name for hooks in sm

2247f50

Remove 4-tuple-routes from sm

ffaf612

Define external c2s state type and use in sm

ab5f93f

Define a small previd helper for sm

0a05485

Refactor helpers for opening and closing c2s sessions

ddcd1a9

NelsonVides added 4 commits December 27, 2022 10:57

Refactor Stream Management

4bac10c

SM flaky test: give time for the backend to remove the value

4fe77d9

Reimplement CSI using the new c2s framework

881d827

NelsonVides force-pushed the c2s/csi branch from fa2ee18 to 881d827 Compare December 27, 2022 09:57

This comment was marked as outdated.

Sign in to view

NelsonVides marked this pull request as ready for review December 27, 2022 13:04

NelsonVides requested a review from kamilwaz December 27, 2022 13:05

kamilwaz requested changes Jan 2, 2023

View reviewed changes

NelsonVides requested a review from kamilwaz January 2, 2023 14:00

This comment was marked as outdated.

Sign in to view

chrzaszcz reviewed Jan 4, 2023

View reviewed changes

Fix naming variables and type priority

f7cd5c1

NelsonVides force-pushed the c2s/csi branch from 979f99e to 6b8fba8 Compare January 4, 2023 11:22

Test that it returns an error if invalid CSI request

a356ac5

NelsonVides force-pushed the c2s/csi branch from 6b8fba8 to a356ac5 Compare January 4, 2023 11:23

chrzaszcz approved these changes Jan 4, 2023

View reviewed changes

kamilwaz approved these changes Jan 4, 2023

View reviewed changes

kamilwaz merged commit bb5e432 into feature/mongoose_c2s Jan 4, 2023

kamilwaz deleted the c2s/csi branch January 4, 2023 12:47

jacekwegr added this to the 6.1.0 milestone Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C2S/csi #3880

C2S/csi #3880

NelsonVides commented Nov 29, 2022 •

edited

Loading

This comment was marked as outdated.

codecov bot commented Nov 29, 2022 •

edited

Loading

This comment was marked as outdated.

kamilwaz left a comment

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

kamilwaz left a comment

kamilwaz Jan 2, 2023

chrzaszcz Jan 3, 2023

kamilwaz Jan 2, 2023

NelsonVides Jan 2, 2023

kamilwaz Jan 2, 2023

NelsonVides Jan 2, 2023

kamilwaz Jan 2, 2023

kamilwaz Jan 2, 2023

NelsonVides Jan 2, 2023

This comment was marked as outdated.

chrzaszcz left a comment

chrzaszcz Jan 3, 2023

mongoose-im commented Jan 4, 2023 •

edited

Loading

chrzaszcz left a comment

	HookParams = hook_arg(StateData, C2SState, info, Acc, flust),
	HookParams = hook_arg(StateData, C2SState, info, Acc, flush),

C2S/csi #3880

C2S/csi #3880

Conversation

NelsonVides commented Nov 29, 2022 • edited Loading

c2s

Route pipeline processing

State machine states

Stream management

CSI

This comment was marked as outdated.

codecov bot commented Nov 29, 2022 • edited Loading

Codecov Report

This comment was marked as outdated.

kamilwaz left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

kamilwaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as outdated.

chrzaszcz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mongoose-im commented Jan 4, 2023 • edited Loading

chrzaszcz left a comment

Choose a reason for hiding this comment

NelsonVides commented Nov 29, 2022 •

edited

Loading

codecov bot commented Nov 29, 2022 •

edited

Loading

mongoose-im commented Jan 4, 2023 •

edited

Loading