Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review and modify reliability cluster tests due to logging modifications #4701

Closed
fdalmaup opened this issue Nov 15, 2023 · 1 comment · Fixed by #4706
Closed

Review and modify reliability cluster tests due to logging modifications #4701

fdalmaup opened this issue Nov 15, 2023 · 1 comment · Fixed by #4706
Assignees
Labels

Comments

@fdalmaup
Copy link
Member

Description

In wazuh/wazuh#19800 the team modified the way the cluster's subprocesses log into cluster.log. We need to review the tests inside reliability/test_cluster/test_cluster_logs and verify which of these should be marked as XFAIL until the new subprocess logging mechanism is developed in wazuh/wazuh#20162.

Current behavior

  • The cluster logs reliability tests may fail due to the mentioned changes.

Expected behavior

  • The cluster logs reliability tests that are expected to fail are marked with xfail.
@wazuhci wazuhci moved this to Triage in Release 4.7.1 Nov 22, 2023
@wazuhci wazuhci moved this from Triage to Backlog in Release 4.7.1 Nov 22, 2023
@fdalmaup fdalmaup self-assigned this Nov 22, 2023
@wazuhci wazuhci moved this from Backlog to In progress in Release 4.7.1 Nov 22, 2023
@fdalmaup
Copy link
Member Author

fdalmaup commented Nov 22, 2023

Issue Update

The reliability/test_cluster/test_cluster_logs tests are mainly used to obtain cluster workload benchmark metrics using the output logs. When running the tests with the logs generated with the latest modifications, the following results were found:

results
/wazuh-qa/tests/reliability/test_cluster/test_cluster_logs# pytest . --artifacts_path /pkg/artifacts
====================================================================== test session starts ======================================================================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.3.0
rootdir: /wazuh-qa/tests, configfile: pytest.ini
plugins: metadata-3.0.0, html-3.1.1, testinfra-5.0.0
collected 6 items                                                                                                                                               

test_cluster_connection/test_cluster_connection.py F                                                                                                      [ 16%]
test_cluster_error_logs/test_cluster_error_logs.py .                                                                                                      [ 33%]
test_cluster_master_logs_order/test_cluster_master_logs_order.py .                                                                                        [ 50%]
test_cluster_sync/test_cluster_sync.py .                                                                                                                  [ 66%]
test_cluster_task_order/test_cluster_task_order.py .                                                                                                      [ 83%]
test_cluster_worker_logs_order/test_cluster_worker_logs_order.py F                                                                                        [100%]

=========================================================================== FAILURES ============================================================================
____________________________________________________________________ test_cluster_connection ____________________________________________________________________

artifacts_path = '/pkg/artifacts'

    def test_cluster_connection(artifacts_path):
        """Verify that no worker disconnects from the master once they are connected.
    
        For each worker, this test looks for the first successful connection message
        in its logs. Then it looks for any failed connection attempts after the successful
        connection found above.
    
        Args:
            artifacts_path (str): Path where folders with cluster information can be found.
        """
        if not artifacts_path:
            pytest.fail("Parameter '--artifacts_path=<path>' is required.")
    
        cluster_log_files = glob(join(artifacts_path, 'worker_*', 'logs', 'cluster.log'))
        if len(cluster_log_files) == 0:
            pytest.fail(f'No files found inside {artifacts_path}.')
    
        for log_file in cluster_log_files:
            with open(log_file) as f:
                s = mmap(f.fileno(), 0, access=ACCESS_READ)
                # Search first successful connection message.
                conn = re.search(rb'^.*Successfully connected to master.*$', s, flags=re.MULTILINE)
                if not conn:
                    pytest.fail(f'Could not find "Successfully connected to master" message in the '
                                f'{node_name.search(log_file)[1]}')
    
                # Search if there are any connection attempts after the message found above.
                if re.search(rb'^.*Could not connect to master. Trying.*$|^.*Successfully connected to master.*$',
                             s[conn.end():], flags=re.MULTILINE):
                    disconnected_nodes.append(node_name.search(log_file)[1])
    
        if disconnected_nodes:
>           pytest.fail(f'The following nodes disconnected from master at any point:\n- ' + '\n- '.join(disconnected_nodes))
E           Failed: The following nodes disconnected from master at any point:
E           - worker_1
E           - worker_2
E           - worker_4
E           - worker_3
E           - worker_5

test_cluster_connection/test_cluster_connection.py:47: Failed
_________________________________________________________________ test_check_logs_order_workers _________________________________________________________________

artifacts_path = '/pkg/artifacts'

    def test_check_logs_order_workers(artifacts_path):
        """Check that cluster logs appear in the expected order.
    
        Check that for each group of logs (agent-info, integrity-check, etc), each message
        appears in the order it should. If any log is duplicated, skipped, etc. the test will fail.
    
        Args:
            artifacts_path (str): Path where folders with cluster information can be found.
        """
        if not artifacts_path:
            pytest.fail('Parameter "--artifacts_path=<path>" is required.')
    
        cluster_log_files = glob(os.path.join(artifacts_path, 'worker_*', 'logs', 'cluster.log'))
        if len(cluster_log_files) == 0:
            pytest.fail(f'No files found inside {artifacts_path}.')
    
        for log_file in cluster_log_files:
            failed_tasks = set()
    
            with open(log_file) as file:
                for line in file.readlines():
                    result = worker_logs_format.search(line)
                    if result:
                        if result.group(1) in logs_order and result.group(1) not in failed_tasks:
                            tree_info = logs_order[result.group(1)]
                            for child in tree_info['tree'].children(tree_info['node']):
                                if re.search(child.tag, result.group(2)):
                                    # Current node is updated so the tree points to the next expected log.
                                    logs_order[result.group(1)]['node'] = child.identifier if \
                                        tree_info['tree'].children(child.identifier) else 'root'
    
                                    break
                            else:
                                # Log can be different to the expected one only if permission was not granted.
                                if "Master didn't grant permission to start a new" not in result.group(2):
                                    if node_name.search(log_file)[1] not in incorrect_order:
                                        incorrect_order[node_name.search(log_file)[1]] = []
                                    incorrect_order[node_name.search(log_file)[1]].append({
                                        'log_type': result.group(1),
                                        'found_log': result.group(0),
                                        'expected_logs': [log.tag for log in tree_info['tree'].children(tree_info['node'])]
                                    })
    
                                    failed_tasks.add(result.group(1))
    
            # Update status of all logs so they point to their tree root.
            for log_type, tree_info in logs_order.items():
                tree_info['node'] = 'root'
    
        if incorrect_order:
            result = ''
            for node, info in incorrect_order.items():
                result += f"\n\n{node}"
                for items in info:
                    result += '\n - Log type: {log_type}\n' \
                              '   Expected logs: {expected_logs}\n' \
                              '   Found log: {found_log}\n'.format(**items)
    
>           pytest.fail(result)
E           Failed: 
E           
E           worker_1
E            - Log type: Integrity sync
E              Expected logs: ['Received [0-9]* missing files to update from master.']
E              Found log: 2023/11/22 12:08:42 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B315_manager_1] [Integrity sync] Updating local files: End.
E           
E           
E           worker_2
E            - Log type: Integrity sync
E              Expected logs: ['Received [0-9]* missing files to update from master.']
E              Found log: 2023/11/22 12:08:43 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B315_manager_2] [Integrity sync] Updating local files: End.
E           
E           
E           worker_4
E            - Log type: Integrity sync
E              Expected logs: ['Received [0-9]* missing files to update from master.']
E              Found log: 2023/11/22 12:08:44 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B315_manager_4] [Integrity sync] Updating local files: End.
E           
E           
E           worker_3
E            - Log type: Integrity sync
E              Expected logs: ['Received [0-9]* missing files to update from master.']
E              Found log: 2023/11/22 12:08:44 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B315_manager_3] [Integrity sync] Updating local files: End.
E           
E           
E           worker_5
E            - Log type: Integrity sync
E              Expected logs: ['Received [0-9]* missing files to update from master.']
E              Found log: 2023/11/22 12:08:43 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B315_manager_5] [Integrity sync] Updating local files: End.

test_cluster_worker_logs_order/test_cluster_worker_logs_order.py:103: Failed
==================================================================== short test summary info ====================================================================
FAILED test_cluster_connection/test_cluster_connection.py::test_cluster_connection - Failed: The following nodes disconnected from master at any point:
FAILED test_cluster_worker_logs_order/test_cluster_worker_logs_order.py::test_check_logs_order_workers - Failed: 
================================================================== 2 failed, 4 passed in 2.23s ==================================================================

The error related to test_cluster_connection is expected since the cluster is restarted as part of the API performance test. Now, the error in test_check_logs_order_workers is from a log that was removed in wazuh/wazuh#19888 from framework/wazuh/core/cluster/worker.py. The test will be marked as XFAIL.

# pytest test_cluster_worker_logs_order/ --artifacts_path /pkg/artifacts
====================================================================== test session starts ======================================================================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.3.0
rootdir: /wazuh-qa/tests, configfile: pytest.ini
plugins: metadata-3.0.0, html-3.1.1, testinfra-5.0.0
collected 1 item                                                                                                                                                

test_cluster_worker_logs_order/test_cluster_worker_logs_order.py x                                                                                        [100%]

====================================================================== 1 xfailed in 0.44s =======================================================================

@fdalmaup fdalmaup linked a pull request Nov 22, 2023 that will close this issue
@fdalmaup fdalmaup moved this from In progress to Pending review in Release 4.7.1 Nov 22, 2023
@wazuhci wazuhci moved this from Pending review to Pending final review in Release 4.7.1 Nov 22, 2023
@wazuhci wazuhci moved this from Pending final review to In final review in Release 4.7.1 Nov 22, 2023
@wazuhci wazuhci moved this from In final review to On hold in Release 4.7.1 Nov 22, 2023
@wazuhci wazuhci moved this from On hold to Pending final review in Release 4.7.1 Nov 22, 2023
@github-project-automation github-project-automation bot moved this from Pending final review to Done in Release 4.7.1 Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants