Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci_agent in build.ros2.org has unexpected running nodes in background and causes test regressions #11

Open
Crola1702 opened this issue Nov 21, 2023 · 1 comment

Comments

@Crola1702
Copy link
Contributor

Migrated from https://github.com/osrf/buildfarmer/issues/337 on Aug 22, 2022

Description

3 unexpected nodes in this agent are causing test regressions on Humble

Reference builds:

Failing tests:

Example test

Test: rcl.TestGetNodeNames__rmw_fastrtps_cpp.test_rcl_get_node_names (from TestGetNodeNames__rmw_fastrtps_cpp)
Stacktrace:

/tmp/ws/src/ros2/rcl/rcl/test/rcl/test_get_node_names.cpp:138
Expected equality of these values:
  discovered_nodes
    Which is: { ("demo_node_0", "/"), ("demo_node_1", "/"), ("demo_node_2", "/"), ("node1", "/"), ("node1", "/"), ("node2", "/"), ("node2", "/ns/ns"), ("node3", "/ns") }
  expected_nodes
    Which is: { ("node1", "/"), ("node1", "/"), ("node2", "/"), ("node2", "/ns/ns"), ("node3", "/ns") }

There are 3 unexpected nodes on this test: demo_node_0, demo_node_1, and demo_node_2.

Explanation

  • We tracked errors throughout the log and found test were failing because of 3 nodes that appeared to be created since the start of the build
  • We tried to replicate this error in ci.ros2.org: Build Status
  • As we couldn't replicate the error in ci.ros2.org we think this agent had issues destroying those 3 nodes.
  • We think this error may be related to processes not being closed after a failure (even with Docker).

22/08 Update

@Crola1702
Copy link
Contributor Author

Commented by cottsay on Aug 22, 2022

From the troubled node (ci_agent-ffcf5120), this is the container that's still running:

f9566d9966832ea21871dff4f7a10745ad058d74c5c9545eb79632e6e5fee119   1660307708.398302425.ci_build_and_test.rolling   "sh -c 'PATH=/usr/lib/ccache:$PATH PYTHONPATH=/tmp/ros_buildfarm:$PYTHONPATH python3 -u /tmp/ros_buildfarm/scripts/devel/build_and_test.py --rosdistro-name rolling --ros-version 2 --build-tool colcon --workspace-root /tmp/ws --parent-result-space --build-tool-args --cmake-args -DCMAKE_BUILD_TYPE=Release -DSKIP_MULTI_RMW_TESTS=1 --no-warn-unused-cli --build-tool-test-args --retest-until-pass 2 --ctest-args -LE xfail --pytest-args -m \"not xfail\"'"
jenkins+ 2883215  0.0  0.0   2888    96 ?        S    Aug12   0:00 /bin/sh -c PYTHONIOENCODING=utf_8 PYTHONUNBUFFERED=1 colcon test --build-base build_isolated --install-base install_isolated --test-result-base test_results --event-handlers console_direct+ --executor sequential --test-result-base /tmp/ws/test_results --retest-until-pass 2 --ctest-args -LE xfail --pytest-args -m "not xfail"
jenkins+ 2883216  0.0  0.7 224048 57720 ?        Sl   Aug12  13:43 /usr/bin/python3 /usr/bin/colcon test --build-base build_isolated --install-base install_isolated --test-result-base test_results --event-handlers console_direct+ --executor sequential --test-result-base /tmp/ws/test_results --retest-until-pass 2 --ctest-args -LE xfail --pytest-args -m not xfail
jenkins+ 2994922  0.1  0.6 922448 50136 ?        Sl   Aug12  27:04 /usr/bin/python3 -m pytest
jenkins+ 2994957  0.1  0.8 709136 66904 ?        Sl   Aug12  19:16 /tmp/ws/install_isolated/demo_nodes_cpp/lib/demo_nodes_cpp/talker --ros-args -r __node:=demo_node_0
jenkins+ 2994959  0.0  0.8 708968 66648 ?        Sl   Aug12  11:07 /tmp/ws/install_isolated/demo_nodes_cpp/lib/demo_nodes_cpp/talker --ros-args -r __node:=demo_node_1
jenkins+ 2994961  0.0  0.8 709148 66272 ?        Sl   Aug12  11:14 /tmp/ws/install_isolated/demo_nodes_cpp/lib/demo_nodes_cpp/talker --ros-args -r __node:=demo_node_2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant