Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky TLS tests for GraphQL #4279

Merged
merged 4 commits into from
May 17, 2024
Merged

Conversation

chrzaszcz
Copy link
Member

@chrzaszcz chrzaszcz commented May 16, 2024

The goal is to properly fix the TLS error handling tests, which were sometimes returning connection_closed instead of the proper certificate error.

The reason for this was that fusco (just like gun and httpc) opens the connection and immediately tries to send the request. When getting the response:

  • If the server closed the connection already, connection_closed would be the result.
  • Otherwise, the result would be the TLS error.

To mitigate this, this PR uses the lower-level ssl module directly, breaking up the test into steps.
This way, the error can be checked without sending any further data.
The only downside was the need for manually constructing the request and parsing the response. To avoid coding this from scratch, the already existing lower-level utilities from fusco and httpc were used.

A proper solution would be to add an option to one of the client libraries to break up the request process into parts, but it would be too much for this simple issue.


To ensure that this is fixed:

  • I repeated the tls_enabled group 100 times locally, with each group running each test 100 times instead of once, resulting in 100 runs of 900 parallel tests - 90000 tests in total (!)
  • I repeated this on CI with 10 groups of 90 tests.
  • I also repeated this locally with 10 groups of 90 tests, where the ssl library was slowed down on the client side by tracing it. Without the fix, it was failing always.

All tests succeeded.

@mongoose-im

This comment was marked as outdated.

Copy link

codecov bot commented May 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.42%. Comparing base (e0b6504) to head (35c374d).
Report is 20 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4279      +/-   ##
==========================================
- Coverage   84.42%   84.42%   -0.01%     
==========================================
  Files         553      553              
  Lines       33582    33582              
==========================================
- Hits        28352    28351       -1     
- Misses       5230     5231       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mongoose-im

This comment was marked as outdated.

The returned data might be binary if it wasn't JSON.
@chrzaszcz chrzaszcz force-pushed the fix-graphql-tls-tests branch 2 times, most recently from 896ffa6 to b09db22 Compare May 16, 2024 14:03
@mongoose-im
Copy link
Collaborator

mongoose-im commented May 16, 2024

elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / 896ffa6
Reports root/ big
OK: 437 / Failed: 0 / User-skipped: 43 / Auto-skipped: 0


small_tests_25 / small_tests / 896ffa6
Reports root / small


small_tests_26 / small_tests / 896ffa6
Reports root / small


small_tests_26_arm64 / small_tests / 896ffa6
Reports root / small


ldap_mnesia_25 / ldap_mnesia / 896ffa6
Reports root/ big
OK: 2286 / Failed: 0 / User-skipped: 897 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / 896ffa6
Reports root/ big
OK: 4543 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / 896ffa6
Reports root/ big
OK: 2286 / Failed: 0 / User-skipped: 897 / Auto-skipped: 0


dynamic_domains_mysql_redis_26 / mysql_redis / 896ffa6
Reports root/ big
OK: 4510 / Failed: 0 / User-skipped: 144 / Auto-skipped: 0


internal_mnesia_26 / internal_mnesia / 896ffa6
Reports root/ big
OK: 2426 / Failed: 0 / User-skipped: 757 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 896ffa6
Reports root/ big
OK: 4543 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / 896ffa6
Reports root/ big
OK: 4539 / Failed: 1 / User-skipped: 114 / Auto-skipped: 0

auth_methods_for_c2s_SUITE:two_methods_enabled:cannot_login_with_not_allowed_method
{error,
  {"Expected stream features, got {xmlel,<<\"stream:error\">>,[],\n                  [{xmlel,<<\"host-unknown\">>,\n                     [{<<\"xmlns\">>,\n                     <<\"urn:ietf:params:xml:ns:xmpp-streams\">>}],\n                     []}]}",
   [{escalus_session,assert_stream_features,3,
      [{file,
         "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_session.erl"},
       {line,291}]},
    {escalus_session,stream_features,2,
      [{file,
         "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_session.erl"},
       {line,190}]},
    {escalus_connection,connection_step,2,
      [{file,
         "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_connection.erl"},
       {line,161}]},
    {lists,foldl_1,3,[{file,"lists.erl"},{line,1599}]},
    {escalus_connection,start,2,
      [{file,
         "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_connection.erl"},
       {line,145}]},
    {auth_methods_for_c2s_SUITE,cannot_login_with_not_allowed_method,1,
      [{file,
         "/home/circleci/project/big_tests/tests/auth_methods_for_c2s_SUITE.erl"},
       {line,89}]},
    {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
    {test_server,run_test_case_eval1,6,
      [{file,"test_server.erl"},{line,1302}]}]}}

Report log


pgsql_cets_26 / pgsql_cets / 896ffa6
Reports root/ big
OK: 4458 / Failed: 0 / User-skipped: 178 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / 896ffa6
Reports root/ big
OK: 4932 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / 896ffa6
Reports root/ big
OK: 4932 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0


mysql_redis_26 / mysql_redis / 896ffa6
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 139 / Auto-skipped: 0


mssql_mnesia_26 / odbc_mssql_mnesia / 896ffa6
Reports root/ big
OK: 4929 / Failed: 0 / User-skipped: 121 / Auto-skipped: 0

@mongoose-im
Copy link
Collaborator

mongoose-im commented May 16, 2024

elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / b09db22
Reports root/ big
OK: 437 / Failed: 0 / User-skipped: 43 / Auto-skipped: 0


small_tests_25 / small_tests / b09db22
Reports root / small


small_tests_26 / small_tests / b09db22
Reports root / small


small_tests_26_arm64 / small_tests / b09db22
Reports root / small


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / b09db22
Reports root/ big
OK: 4407 / Failed: 4 / User-skipped: 111 / Auto-skipped: 132

graphql_server_SUITE:admin_cli:clustering_tests:remove_alive_from_cluster
{error,{{badrpc,nodedown},
    [{distributed_helper,rpc,
               [#{node => mongooseim3@localhost,
                timeout => 60000},
                mongoose_cluster,join,
                [mongooseim@localhost]],
               [{file,"/home/circleci/project/big_tests/../test/common/distributed_helper.erl"},
                {line,140}]},
     {graphql_server_SUITE,remove_alive_from_cluster,1,
                 [{file,"/home/circleci/project/big_tests/tests/graphql_server_SUITE.erl"},
                {line,214}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1782}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1291}]},
     {test_server,run_test_case_eval,9,
            [{file,"test_server.erl"},{line,1223}]}]}}

Report log

graphql_server_SUITE:admin_cli:clustering_tests:remove_node_test
{error,{#{expected_type => ok,
      response_code => {exit_status,3},
      what => invalid_response_code},
    [{graphql_helper,assert_response_code,2,
             [{file,"/home/circleci/project/big_tests/tests/graphql_helper.erl"},
              {line,258}]},
     {graphql_helper,get_ok_value,2,
             [{file,"/home/circleci/project/big_tests/tests/graphql_helper.erl"},
              {line,241}]},
     {graphql_server_SUITE,remove_node_test,1,
                 [{file,"/home/circleci/project/big_tests/tests/graphql_server_SUITE.erl"},
                {line,225}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1782}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1291}]},
     {test_server,run_test_case_eval,9,
            [{file,"test_server.erl"},{line,1223}]}]}}

Report log

graphql_server_SUITE:admin_cli:clustering_tests:stop_node_test
{error,{#{expected_type => ok,
      response_code => {exit_status,3},
      what => invalid_response_code},
    [{graphql_helper,assert_response_code,2,
             [{file,"/home/circleci/project/big_tests/tests/graphql_helper.erl"},
              {line,258}]},
     {graphql_helper,get_ok_value,2,
             [{file,"/home/circleci/project/big_tests/tests/graphql_helper.erl"},
              {line,241}]},
     {graphql_server_SUITE,stop_node_test,1,
                 [{file,"/home/circleci/project/big_tests/tests/graphql_server_SUITE.erl"},
                {line,230}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1782}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1291}]},
     {test_server,run_test_case_eval,9,
            [{file,"test_server.erl"},{line,1223}]}]}}

Report log

service_domain_db_SUITE:init_per_suite
{'EXIT',
  {{badrpc,nodedown},
   [{distributed_helper,rpc,
      [#{node => mongooseim3@localhost},
       mongoose_service,loaded_services_with_opts,[]],
      [{file,
         "/home/circleci/project/big_tests/../test/common/distributed_helper.erl"},
       {line,140}]},
    {dynamic_services,save_services,2,
      [{file,
         "/home/circleci/project/big_tests/tests/dynamic_services.erl"},
       {line,15}]},
    {lists,foldl_1,3,[{file,"lists.erl"},{line,1355}]},
    {service_domain_db_SUITE,init_per_suite,1,
      [{file,
         "/home/circleci/project/big_tests/tests/service_domain_db_SUITE.erl"},
       {line,198}]},
    {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1782}]},
    {test_server,run_test_case_eval1,6,
      [{file,"test_server.erl"},{line,1379}]},
    {test_server,run_test_case_eval,9,
      [{file,"test_server.erl"},{line,1223}]}]}}

Report log

service_mongoose_system_metrics_SUITE:log_transparency:just_removed_from_config_logs_question
{error,
  {{badrpc,nodedown},
   [{distributed_helper,rpc,
      [#{node => mongooseim3@localhost},
       mongoose_service,ensure_stopped,
       [service_mongoose_system_metrics]],
      [{file,
         "/home/circleci/project/big_tests/../test/common/distributed_helper.erl"},
       {line,140}]},
    {service_mongoose_system_metrics_SUITE,disable_system_metrics,1,
      [{file,
         "/home/circleci/project/big_tests/tests/service_mongoose_system_metrics_SUITE.erl"},
       {line,437}]},
    {service_mongoose_system_metrics_SUITE,
      just_removed_from_config_logs_question,1,
      [{file,
         "/home/circleci/project/big_tests/tests/service_mongoose_system_metrics_SUITE.erl"},
       {line,283}]},
    {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1782}]},
    {test_server,run_test_case_eval1,6,
      [{file,"test_server.erl"},{line,1291}]},
    {test_server,run_test_case_eval,9,
      [{file,"test_server.erl"},{line,1223}]}]}}

Report log


ldap_mnesia_25 / ldap_mnesia / b09db22
Reports root/ big
OK: 2286 / Failed: 0 / User-skipped: 897 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / b09db22
Reports root/ big
OK: 2286 / Failed: 0 / User-skipped: 897 / Auto-skipped: 0


dynamic_domains_mysql_redis_26 / mysql_redis / b09db22
Reports root/ big
OK: 4510 / Failed: 0 / User-skipped: 144 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / b09db22
Reports root/ big
OK: 4540 / Failed: 0 / User-skipped: 114 / Auto-skipped: 0


internal_mnesia_26 / internal_mnesia / b09db22
Reports root/ big
OK: 2426 / Failed: 0 / User-skipped: 757 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / b09db22
Reports root/ big
OK: 4543 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0


pgsql_cets_26 / pgsql_cets / b09db22
Reports root/ big
OK: 4458 / Failed: 0 / User-skipped: 178 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / b09db22
Reports root/ big
OK: 4932 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / b09db22
Reports root/ big
OK: 4932 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0


mysql_redis_26 / mysql_redis / b09db22
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 139 / Auto-skipped: 0


mssql_mnesia_26 / odbc_mssql_mnesia / b09db22
Reports root/ big
OK: 4929 / Failed: 0 / User-skipped: 121 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / b09db22
Reports root/ big
OK: 4543 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0

The main goal is to check errors without sending the request.
Otherwise, the errors would be unpredictable due to race conditions.

For example, if there is a TLS error, and we try to send a request,
this would sometimes succeed, and sometimes fail.
Receiving the response could result in the 'closed' error or in a TLS error,
depending on the timing.

This is why neither of the clients (fusco, gun, httpc) can be used.
Instead, the 'ssl' module is used directly.
Otherwise, jiffy encodes them as lists of numbers,
which makes them unreadable.
The motivation is to eliminate flakiness caused by race conditions.

- Negative tests check TLS errors after connecting without sending
  any requests (which would cause the socket to return the 'closed'
  error if it was already closed)
- Positive tests use the same internal API to make them similar, and
  to check that the API can handle succeess and failure.
- Positive tests execute meaningful queries now.
@mongoose-im
Copy link
Collaborator

mongoose-im commented May 17, 2024

elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / 35c374d
Reports root/ big
OK: 437 / Failed: 0 / User-skipped: 43 / Auto-skipped: 0


small_tests_25 / small_tests / 35c374d
Reports root / small


small_tests_26 / small_tests / 35c374d
Reports root / small


small_tests_26_arm64 / small_tests / 35c374d
Reports root / small


ldap_mnesia_25 / ldap_mnesia / 35c374d
Reports root/ big
OK: 2286 / Failed: 0 / User-skipped: 897 / Auto-skipped: 0


dynamic_domains_mysql_redis_26 / mysql_redis / 35c374d
Reports root/ big
OK: 4510 / Failed: 0 / User-skipped: 144 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / 35c374d
Reports root/ big
OK: 2286 / Failed: 0 / User-skipped: 897 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / 35c374d
Reports root/ big
OK: 4543 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0


internal_mnesia_26 / internal_mnesia / 35c374d
Reports root/ big
OK: 2426 / Failed: 0 / User-skipped: 757 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 35c374d
Reports root/ big
OK: 4543 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / 35c374d
Reports root/ big
OK: 4540 / Failed: 0 / User-skipped: 114 / Auto-skipped: 0


pgsql_cets_26 / pgsql_cets / 35c374d
Reports root/ big
OK: 4458 / Failed: 0 / User-skipped: 178 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / 35c374d
Reports root/ big
OK: 4932 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / 35c374d
Reports root/ big
OK: 4932 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0


mssql_mnesia_26 / odbc_mssql_mnesia / 35c374d
Reports root/ big
OK: 4929 / Failed: 0 / User-skipped: 121 / Auto-skipped: 0


mysql_redis_26 / mysql_redis / 35c374d
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 139 / Auto-skipped: 0

@chrzaszcz chrzaszcz marked this pull request as ready for review May 17, 2024 07:41
Copy link
Contributor

@JanuszJakubiec JanuszJakubiec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@JanuszJakubiec JanuszJakubiec merged commit 080294a into master May 17, 2024
4 checks passed
@JanuszJakubiec JanuszJakubiec deleted the fix-graphql-tls-tests branch May 17, 2024 08:02
@jacekwegr jacekwegr added this to the 6.3.0 milestone Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants