Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Tests Tracker #3012

Open
prasannavl opened this issue Aug 27, 2024 · 3 comments
Open

Flaky Tests Tracker #3012

prasannavl opened this issue Aug 27, 2024 · 3 comments

Comments

@prasannavl
Copy link
Member

prasannavl commented Aug 27, 2024

Documenting flaky tests

  • Most flaky tests are related to concurrency and how soon the node gets running, system resources and minor timing effects due to it. However, documenting them as we can aim to fix it and not ensure we don't end up adding new ones to it.
  • Majority of them can be re-created with varying concurrency with MAKE_JOBS=32/64/128/256 etc, and also selecting different numbers or tests at random, and/or limiting resources with cgroups.

Wide spectrum

Localized

  • wallet_txn_doublespend.py
    • This checks for initial block download and fails. If we loop through it to wait for longer, this should be fixed.
  • feature_token_merge.py
    • This seems to be something in test that's out of sync
test_framework.authproxy.JSONRPCException: Test AddPoolLiquidityTx execution failed:                                           
tx must have at least one input from account owner (-32600)
@prasannavl
Copy link
Member Author

100/263 - feature_loan_basics.py failed, Duration: 65 s

stdout:
2024-08-27T05:01:59.100000Z TestFramework (INFO): Initializing test directory /__w/ain/ain/build/test_runner/test_runner_20240827_045545/feature_loan_basics_174
Generating initial chain...
2024-08-27T05:02:03.209000Z TestFramework (INFO): Stopping nodes
2024-08-27T05:03:03.567000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
    def is_node_stopped(self):
        """Checks whether the node has stopped.

        Returns True if the node has stopped. False otherwise.
        This method is responsible for freeing resources (self.process)."""
        if not self.running:
            return True
        return_code = self.process.poll()
        if return_code is None:
            return False

        # process has stopped. Assert that it didn't return an error code.
        assert return_code == 0, self._node_msg(
            "Node returned non-zero exit code (%d) when stopping" % return_code
        )
        self.running = False
        self.process = None
        self.rpc_connected = False
        self.rpc = None
        self.evm_rpc = None
        self.log.debug("Node stopped")
        return True
'''


stderr:
Traceback (most recent call last):
  File "/__w/ain/ain/test/functional/feature_loan_basics.py", line 685, in <module>
    LoanTakeLoanTest().main()
  File "/__w/ain/ain/test/functional/test_framework/test_framework.py", line 319, in main
    self.stop_nodes()
  File "/__w/ain/ain/test/functional/test_framework/test_framework.py", line 574, in stop_nodes
    node.wait_until_stopped()
  File "/__w/ain/ain/test/functional/test_framework/test_node.py", line 525, in wait_until_stopped
    wait_until(self.is_node_stopped, timeout=timeout)
  File "/__w/ain/ain/test/functional/test_framework/util.py", line 282, in wait_until
    raise AssertionError(
AssertionError: Predicate ''''
    def is_node_stopped(self):
        """Checks whether the node has stopped.

        Returns True if the node has stopped. False otherwise.
        This method is responsible for freeing resources (self.process)."""
        if not self.running:
            return True
        return_code = self.process.poll()
        if return_code is None:
            return False

        # process has stopped. Assert that it didn't return an error code.
        assert return_code == 0, self._node_msg(
            "Node returned non-zero exit code (%d) when stopping" % return_code
        )
        self.running = False
        self.process = None
        self.rpc_connected = False
        self.rpc = None
        self.evm_rpc = None
        self.log.debug("Node stopped")
        return True
''' not true after 60 seconds

@prasannavl
Copy link
Member Author

212/263 - feature_evm_contracts.py failed, Duration: 6 s

stdout:
2024-08-27T12:03:56.711000Z TestFramework (INFO): Initializing test directory /__w/ain/ain/build/test_runner/test_runner_20240827_115342/feature_evm_contracts_53
2024-08-27T12:04:01.490000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/__w/ain/ain/test/functional/test_framework/test_framework.py", line 294, in main
    self.run_test()
  File "/__w/ain/ain/test/functional/feature_evm_contracts.py", line 496, in run_test
    self.fail_send_large_tx()
  File "/__w/ain/ain/test/functional/feature_evm_contracts.py", line 463, in fail_send_large_tx
    assert_raises_web3_error(
  File "/__w/ain/ain/test/functional/test_framework/util.py", line 78, in assert_raises_web3_error
    raise AssertionError("No exception raised")
AssertionError: No exception raised
2024-08-27T12:04:01.541000Z TestFramework (INFO): Stopping nodes
2024-08-27T12:04:01.693000Z TestFramework (WARNING): Not cleaning up dir /__w/ain/ain/build/test_runner/test_runner_20240827_115342/feature_evm_contracts_53
2024-08-27T12:04:01.694000Z TestFramework (ERROR): Test failed. Test logging available at /__w/ain/ain/build/test_runner/test_runner_20240827_115342/feature_evm_contracts_53/test_framework.log
2024-08-27T12:04:01.694000Z TestFramework (ERROR): Hint: Call /__w/ain/ain/test/functional/combine_logs.py '/__w/ain/ain/build/test_runner/test_runner_20240827_115342/feature_evm_contracts_53' to consolidate all logs

@prasannavl
Copy link
Member Author

263/263 - feature_loan_low_interest.py failed, Duration: 673 s

stdout:
2024-08-31T13:38:14.224000Z TestFramework (INFO): Initializing test directory /__w/ain/ain/build/test_runner/test_runner_20240831_132659/feature_loan_low_interest_31
Generating initial chain...
loading up account0 with DFI token...
setting up oracles...
setting up loan and collateral tokens...
setting up pool pairs...
creating loan scheme...
2024-08-31T13:48:26.903000Z TestFramework (ERROR): JSONRPC error
Traceback (most recent call last):
  File "/__w/ain/ain/test/functional/test_framework/authproxy.py", line 212, in _get_response
    http_response = self.__conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/__w/ain/ain/test/functional/test_framework/test_framework.py", line 294, in main
    self.run_test()
  File "/__w/ain/ain/test/functional/feature_loan_low_interest.py", line 461, in run_test
    self.test_high_loan()
  File "/__w/ain/ain/test/functional/feature_loan_low_interest.py", line 324, in test_high_loan
    self.nodes[0].generate(35)
  File "/__w/ain/ain/test/functional/test_framework/test_node.py", line 318, in generate
    res = self.generatetoaddress(nblocks=1, address=address, maxtries=1)
  File "/__w/ain/ain/test/functional/test_framework/coverage.py", line 47, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/__w/ain/ain/test/functional/test_framework/authproxy.py", line 171, in __call__
    response, status = self._request(
  File "/__w/ain/ain/test/functional/test_framework/authproxy.py", line 128, in _request
    return self._get_response()
  File "/__w/ain/ain/test/functional/test_framework/authproxy.py", line 214, in _get_response
    raise JSONRPCException(
test_framework.authproxy.JSONRPCException: 'generatetoaddress' RPC took longer than 600.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
2024-08-31T13:48:26.955000Z TestFramework (INFO): Stopping nodes
2024-08-31T13:48:26.955000Z TestFramework.node0 (ERROR): Unable to stop node.
Traceback (most recent call last):
  File "/__w/ain/ain/test/functional/test_framework/test_node.py", line 481, in stop_node
    self.stop(wait=wait)
  File "/__w/ain/ain/test/functional/test_framework/coverage.py", line 47, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/__w/ain/ain/test/functional/test_framework/authproxy.py", line 171, in __call__
    response, status = self._request(
  File "/__w/ain/ain/test/functional/test_framework/authproxy.py", line 127, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/usr/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1267, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.8/http/client.py", line 1093, in putrequest
    raise CannotSendRequest(self.__state)
http.client.CannotSendRequest: Request-sent
2024-08-31T13:49:26.976000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
    def is_node_stopped(self):
        """Checks whether the node has stopped.

        Returns True if the node has stopped. False otherwise.
        This method is responsible for freeing resources (self.process)."""
        if not self.running:
            return True
        return_code = self.process.poll()
        if return_code is None:
            return False

        # process has stopped. Assert that it didn't return an error code.
        assert return_code == 0, self._node_msg(
            "Node returned non-zero exit code (%d) when stopping" % return_code
        )
        self.running = False
        self.process = None
        self.rpc_connected = False
        self.rpc = None
        self.evm_rpc = None
        self.log.debug("Node stopped")
        return True
'''


stderr:
Traceback (most recent call last):
  File "/__w/ain/ain/test/functional/feature_loan_low_interest.py", line 472, in <module>
    LowInterestTest().main()
  File "/__w/ain/ain/test/functional/test_framework/test_framework.py", line 319, in main
    self.stop_nodes()
  File "/__w/ain/ain/test/functional/test_framework/test_framework.py", line 574, in stop_nodes
    node.wait_until_stopped()
  File "/__w/ain/ain/test/functional/test_framework/test_node.py", line 525, in wait_until_stopped
    wait_until(self.is_node_stopped, timeout=timeout)
  File "/__w/ain/ain/test/functional/test_framework/util.py", line 282, in wait_until
    raise AssertionError(
AssertionError: Predicate ''''
    def is_node_stopped(self):
        """Checks whether the node has stopped.

        Returns True if the node has stopped. False otherwise.
        This method is responsible for freeing resources (self.process)."""
        if not self.running:
            return True
        return_code = self.process.poll()
        if return_code is None:
            return False

        # process has stopped. Assert that it didn't return an error code.
        assert return_code == 0, self._node_msg(
            "Node returned non-zero exit code (%d) when stopping" % return_code
        )
        self.running = False
        self.process = None
        self.rpc_connected = False
        self.rpc = None
        self.evm_rpc = None
        self.log.debug("Node stopped")
        return True
''' not true after 60 seconds


Combine the logs and print the last 500 lines ...

@prasannavl prasannavl pinned this issue Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@prasannavl and others