Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error on ping timeout with stream management #4153

Merged
merged 4 commits into from
Nov 8, 2023

Conversation

jacekwegr
Copy link
Contributor

@jacekwegr jacekwegr commented Oct 19, 2023

This PR addresses an issue arising when both mod_ping (with timeout_action = "kill”) and mod_stream_management are enabled. Previously, error stanzas were routed back to the server after the ping and resume timeouts passed, leading to the resending of unacknowledged messages. Due to the lack of appropriate stanza handling in mod_ping, they were processed by ejabberd_local, resulting in ID-related errors.

The stanza arrives with the same ID in case of an error, but it doesn't require further processing since the ping timeout handling has already been invoked.

mod_ping likely skips using ejabberd_local:route_iq with a callback to avoid another laver of indirection. Handling ping responses directly in the module makes the process more straightforward and easier to understand.

@mongoose-im

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Oct 19, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (f989a92) 84.04% compared to head (e0f3cd4) 84.04%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4153   +/-   ##
=======================================
  Coverage   84.04%   84.04%           
=======================================
  Files         564      564           
  Lines       34050    34067   +17     
=======================================
+ Hits        28618    28633   +15     
- Misses       5432     5434    +2     
Files Coverage Δ
src/mod_ping.erl 94.93% <90.00%> (-3.46%) ⬇️

... and 12 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mongoose-im

This comment was marked as outdated.

@jacekwegr jacekwegr marked this pull request as ready for review October 20, 2023 12:58
Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments. The main issue is that this PR only handles the case when the server resends unacked stanzas, while the original story required to handle user-sent ping errors as well - see https://xmpp.org/extensions/xep-0199.html#example-3

src/mod_ping.erl Outdated Show resolved Hide resolved
src/mod_ping.erl Show resolved Hide resolved
big_tests/tests/sm_SUITE.erl Outdated Show resolved Hide resolved
@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

Backend = mongoose_helper:mnesia_or_rdbms_backend(),
[{mod_stream_management, config_parser_helper:mod_config(mod_stream_management, SMConfig)},
{mod_offline, config_parser_helper:mod_config(mod_offline, #{backend => Backend})}].
{mod_offline, config_parser_helper:mod_config(mod_offline, #{backend => Backend})},
{mod_ping, config_parser_helper:mod_config(mod_ping, PingConfig)}].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I understand we don't need to add mod_ping module for any other groups/testcases than ping_timeout

logger_ct_backend:start(),
logger_ct_backend:capture(error),

%% connect Alice and simulate ping timeout by dropping connection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think that this comment doesn't reflect the actions in the subsequent steps

FilterFun = fun(_, Msg) ->
re:run(Msg, "parse_iq_id_failed", [global]) /= nomatch
end,
?assertEqual([], logger_ct_backend:recv(FilterFun)),
Copy link
Collaborator

@DenysGonchar DenysGonchar Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking that some error hasn't occured is a weak approach, especiallly using regular expressions, parse_iq_id_failed identifier may change in the future and this testcase will become false-positive.

I think mocking on the MIM node is a better option to make a validation.

@@ -124,6 +127,9 @@ parallel_unacknowledged_message_hook_cases() ->
unacknowledged_message_hook_resume,
unacknowledged_message_hook_filter].

ping_timeout_cases() ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a group for just one testcase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believed that introducing a separate group would be an optimal solution, given that the stanzas that are sent by mod_ping tend to create issues in other tests

@mongoose-im

This comment was marked as outdated.

Copy link
Collaborator

@DenysGonchar DenysGonchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks better now, we may even make an additional check that message is filtered out before it's getting to the crashing handler, e.g. the list of processed stanzas by mod_ping handler is the same as processed by crashing halder plus the filtered out stanza, also you may want to check that the filtered stanza is the one that you expect.

Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I only have a minor concern about a test case.

big_tests/tests/mod_ping_SUITE.erl Outdated Show resolved Hide resolved
src/mod_ping.erl Show resolved Hide resolved
@mongoose-im
Copy link
Collaborator

mongoose-im commented Nov 7, 2023

elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / e0f3cd4
Reports root/ big
OK: 369 / Failed: 0 / User-skipped: 38 / Auto-skipped: 0


small_tests_25 / small_tests / e0f3cd4
Reports root / small


small_tests_26 / small_tests / e0f3cd4
Reports root / small


small_tests_26_arm64 / small_tests / e0f3cd4
Reports root / small


ldap_mnesia_25 / ldap_mnesia / e0f3cd4
Reports root/ big
OK: 2310 / Failed: 0 / User-skipped: 865 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / e0f3cd4
Reports root/ big
OK: 4275 / Failed: 0 / User-skipped: 112 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / e0f3cd4
Reports root/ big
OK: 2309 / Failed: 1 / User-skipped: 865 / Auto-skipped: 0

pubsub_SUITE:tree+basic:subscribe_options_deliver_option_test
{error,
  {{badmatch,
     [{xmlel,<<"message">>,
        [{<<"from">>,<<"pubsub.localhost">>},
         {<<"to">>,
        <<"bob_subscribe_options_deliver_option_test_1521@localhost/res1">>},
         {<<"type">>,<<"headline">>}],
        [{xmlel,<<"event">>,
           [{<<"xmlns">>,
           <<"http://jabber.org/protocol/pubsub#event">>}],
           [{xmlel,<<"items">>,
            [{<<"node">>,<<"princely_musings_oh8GVddWFFc=">>}],
            [{xmlel,<<"item">>,
               [{<<"id">>,<<"item1">>}],
               [{xmlel,<<"entry">>,
                  [{<<"xmlns">>,
                  <<"http://www.w3.org/2005/Atom">>}],
                  []}]}]}]},
         {xmlel,<<"headers">>,
           [{<<"xmlns">>,<<"http://jabber.org/protocol/shim">>}],
           []}]}]},
   [{pubsub_SUITE,'-subscribe_options_deliver_option_test/1-fun-0-',3,
      [{file,"/home/circleci/project/big_tests/tests/pubsub_SUITE.erl"},
       {line,401}]},
    {escalus_story,story,4,
      [{file,
         "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
       {line,72}]},
    {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
    {test_server,run_test_case_eval1,6,
      [{file,"test_server.erl"},{line,1302}]},
    {test_server,run_test_case_eval,9,
      [{file,"test_server.erl"},{line,1234}]}]}}

Report log


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / e0f3cd4
Reports root/ big
OK: 4275 / Failed: 0 / User-skipped: 112 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / e0f3cd4
Reports root/ big
OK: 4272 / Failed: 0 / User-skipped: 115 / Auto-skipped: 0


dynamic_domains_mysql_redis_26 / mysql_redis / e0f3cd4
Reports root/ big
OK: 4243 / Failed: 0 / User-skipped: 144 / Auto-skipped: 0


internal_mnesia_26 / internal_mnesia / e0f3cd4
Reports root/ big
OK: 2460 / Failed: 0 / User-skipped: 715 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / e0f3cd4
Reports root/ big
OK: 4664 / Failed: 0 / User-skipped: 119 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / e0f3cd4
Reports root/ big
OK: 4664 / Failed: 0 / User-skipped: 119 / Auto-skipped: 0


pgsql_cets_26 / pgsql_cets / e0f3cd4
Reports root/ big
OK: 4662 / Failed: 0 / User-skipped: 121 / Auto-skipped: 0


mysql_redis_26 / mysql_redis / e0f3cd4
Reports root/ big
OK: 4644 / Failed: 0 / User-skipped: 139 / Auto-skipped: 0


mssql_mnesia_26 / odbc_mssql_mnesia / e0f3cd4
Reports root/ big
OK: 4661 / Failed: 0 / User-skipped: 122 / Auto-skipped: 0


small_tests_25 / small_tests / e0f3cd4
Reports root / small


ldap_mnesia_26 / ldap_mnesia / e0f3cd4
Reports root/ big
OK: 2310 / Failed: 0 / User-skipped: 865 / Auto-skipped: 0

Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@DenysGonchar DenysGonchar merged commit ab0b37f into master Nov 8, 2023
4 checks passed
@DenysGonchar DenysGonchar deleted the error-on-ping-timeout branch November 8, 2023 09:01
@chrzaszcz chrzaszcz added this to the 6.2.0 milestone Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants