Test failures on unexpected output #482

rotu · 2020-04-11T06:01:09Z

Bug report

As originally mentioned in ros2/rmw_cyclonedds#147 by @hidmic:

test failures as seen in https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/.

CI up to ros2topic, ros2action and ros2service:

Debug MacOS

This pull request registers regular expressions for output filtering in launch tests that use rmw_cyclonedds_cpp as their RMW implementation of choice. It aims to solve test failures as seen in https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/.

CI up to ros2topic, ros2action and ros2service:

Debug MacOS

https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/ros2lifecycle.src.ros2.ros2cli.ros2lifecycle.test/test_cli/test_cli/

https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/ros2service.src.ros2.ros2cli.ros2service.test/test_cli/test_cli/

https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/ros2topic.src.ros2.ros2cli.ros2topic.test/test_cli/test_cli/

https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/ros2action.src.ros2.ros2cli.ros2action.test/test_cli/test_cli/

Although not seen in the test failure messages, the failures are caused by unexpected output from Cyclone DDS. In particular, "using network interface en0 (udp/192.168.1.120) selected arbitrarily from: en0, en1". This is followed by the output the program expects.

The text was updated successfully, but these errors were encountered:

dirk-thomas · 2020-04-11T06:02:52Z

ros2/rmw_cyclonedds#147 contains the proposal to resolve these test failures.

rotu · 2020-04-11T06:03:13Z

Proposed fix in ros2/rmw_cyclonedds#147

@hidmic:

This pull request registers regular expressions for output filtering in launch tests that use rmw_cyclonedds_cpp as their RMW implementation of choice. It aims to solve test failures as seen in https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/.

CI up to ros2topic, ros2action and ros2service:

Debug MacOS

This fix narrowly targets that one specific category of output message.

dirk-thomas · 2020-04-11T06:03:57Z

fix narrowly targets that one specific category of output message.

The fix applies the same patterns already present in various other tests / RMW implementations.

rotu · 2020-04-11T06:06:57Z

@rotu:

@dirk-thomas a working solution is to remove the strict flag in the calls to expect_output. There is no guarantee that this is the only output from Cyclone - and putting human-readable warning messages on stderr is a feature, not a bug.

@dirk-thomas:

Since you didn't want to continue the conversation on the PR you opened as a replacement to this I will repeat my comment from there here:

These CLI tests checks for strict output for a reason. They not only ensure that certain output is present but also ensure that no other output like warnings / errors / wrong information is present. So no, strict=False is not an appropriate solution.

rotu · 2020-04-11T06:22:04Z

Additional output can happen on stderr, especially when conditions are present that should warrant user concern, or other tools like UBSAN are in use, or simply that verbosity is turned up. Further the proposed fix is fragile since Cyclone logging does not generally conform to an easily detected log pattern.

Here are a few suggested fixes:

Turn off the strict flag on check_output. This flag enables a full-text match instead of just searching for a substring of the process output.
Only check stdout for the purpose of the test. That is where the expected non-diagnostic production of a command line tool should go.
Somewhere in the process under test, before awaiting user output, specifically turn down rcutils and RMW log severity.

dirk-thomas · 2020-04-11T06:31:48Z

Turn off the strict flag on check_output. This flag enables a full-text match instead of just searching for a substring of the process output.

Please read my previous comment why I don't think this is a feasible approach.

Only check stdout for the purpose of the test. That is where the expected non-diagnostic production of a command line tool should go.

That would also leave any other error messages on stderr undetected which is not desired. These tests want to make sure that users get the right output - and not arbitrary error messages.

Somewhere in the process under test, before awaiting user output, specifically turn down rcutils and RMW log severity.

I don't think that messages already suppressed using these RMW specific filters can be categorized by a log level. E.g. if just one of them is a warning that would imply the tests have to run with a threshold of error and therefore wouldn't catch if any real warnings would appear in the output. Those kind of manipulations in the test significantly reduce its usefulness.

rotu · 2020-04-11T07:01:36Z

I read your previous comment several times, and I think you're wrong: even if the program produces unexpected output (because the global logging verbosity is cranked up or whatnot), THESE TEST CASES SHOULD STILL PASS since the program is succeeding at the task being tested.

If there are other error conditions that we want to detect, we should have tests to detect them, not let these test cases turn into catch-all smoke tests. And certainly not have every RMW or plugin package create a whitelist of messages to ignore for launch_testing purposes.

If you need a way to filter RCL or RMW output by severity, it exists. Improve that, don't reinvent a new, half-baked and informal standard for classifying log messages.

hidmic · 2020-04-11T14:12:43Z

Hmm, I think that some insight on the genesis of these tests may help clarify why they are the way they are. Before even launch_testing came into existence, periodically and for every release we would (and still do to a considerable extent) manually run these tools for every platform, rmw implementation and installation type combination. You can imagine the testing matrix is rather large, so to reduce the burden we came up with tests like these. ros2/ros2#739 tracked that initial work.

The problem with attempting to automate system level QA like this, is that the sheer amount of failure modes makes it unfeasible for a test to cover them all. We cannot replace a seasoned engineer's judgement on seemingly unusual logs, so we don't even try. That's why we aim for strict=True in as many places as possible. You're right that's not true for all: non-determinism doesn't help (will this action client get all goal feedback before it gets its result?) and we have to keep maintenance efforts at a reasonable level (maintaining a list of all interface types that exist in ros2.repos up-to-date is inconvenient at least). All contributions to get us closer to that ideal are much appreciated though.

So to address your concerns / thoughts:

even if the program produces unexpected output (because the global logging verbosity is cranked up or whatnot), THESE TEST CASES SHOULD STILL PASS

That'd defeat their purpose. Logging verbosity, implicitly or explicitly, is part of the test configuration.

If there are other error conditions that we want to detect, we should have tests to detect them, not let these test cases turn into catch-all smoke tests.

Yes, that's what these tests are, and no, we can't foresee all error conditions. As you correctly stated:

Additional output can happen on stderr, especially when conditions are present that should warrant user concern

and we do want to know about it.

If you need a way to filter RCL or RMW output by severity, it exists. Improve that, don't reinvent a new, half-baked and informal standard for classifying log messages.

ros2/rmw_cyclonedds#147 is not yet another general purpose log severity filtering mechanism. It is a conscious decision we (and hopefully rmw_cyclonedds_cpp maintainers as well) make on what we know for a fact we can safely ignore. And that's why it has to be that narrow.

Whether that log should even show or not in normal circumstances is a fair question, and I won't object if you guys want to drop it on your side. But I'd rather not suppress a whole output category to achieve that.

rotu · 2020-04-11T15:30:08Z

Your heart is certainly in the right place and I love automating previously manual stuff! The problem is that these build tests are used to verify that development is correct, not just that things are ready at release time. When CI reports a failure (and especially when that failure message is vague), the next step is to rerun the tests locally, with whatever tools are at hand. This includes sanitizers, increased verbosity, debuggers, and ad-hoc debug statements. These debug tools are important to understanding unfamiliar code and, as much as possible, should not cause tests to fail.

As for that particular message, I’m not sure that it should be ignored. The computer has two network interfaces and it is an alert to the user that Cyclone is not completely sure of its choice. (Maybe cyclone could be more confident in that choice or maybe it should document better how to configure which one to use; though that’s a different issue). If it’s important to get reproducible results, release testing should be done on an unambiguous network setup or it should inform cyclone which interface to use.

I’m sure @thomas-moulard will have feelings about this, though I don’t know what they will be.

rotu · 2020-04-11T17:35:50Z

Originally sent via email:

I strongly disfavor merging the pull request as-is. These messages are useful and actionable. They indicate an ambiguity in setup and we don’t want to encourage ignoring such messages with a whitelist. Either they are useful and and should stay and not be filtered out by any testing or interaction tools, or they are not useful and they should not be emitted by default.

Proposals, any of which I think are reasonable:

Fix these tests to ignore irrelevant diagnostic output and only filter for the target output.
Fix these tests to only look at stdout, which is where expected output should come from.
Set up release testing to run in an environment with an unambiguous networking setup.
Configure cyclonedds to specify which networking interface to use. This is what the diagnostic message is trying to communicate.

Note that 1 and 2 would also fix the spurious test failure when rerunning locally with increased verbosity or sanitizers like UBSAN.

hidmic · 2020-04-13T14:36:24Z

I see two actionable outcomes of this discussion.

As for that particular message, I’m not sure that it should be ignored.

The reasons you've outlined for not ignoring that particular log are absolutely reasonable (and something that probably wouldn't have been brought up if these tests were as permissive as you propose). So I can agree on 4. i.e. to configure cyclonedds to specify which networking interface to use.. How likely is eclipse-cyclonedds/cyclonedds#485 to land in the short term? It also makes me wonder, is there an expected ROS 2-specific default behavior that other RMW implementations abide to and rmw_cyclonedds_cpp does not? Thoughts @dirk-thomas?

The problem is that these build tests are used to verify that development is correct, not just that things are ready at release time.

That is absolutely true as well, but the fact that these tests are impractical for development does not render them incorrect. To serve that specific purpose, we could have some form of build time or run time configuration to relax the expectations in these tests, and perhaps that of some CI jobs, allowing one set of tests to serve both development and release QA needs.

Something like 3. i.e. to set up release testing to run in an environment with an unambiguous networking setup., would indeed be nice, though I'm unsure about available infrastructure and man power to do that today.

rotu · 2020-04-13T18:13:05Z

I think it's unlikely that eclipse-cyclonedds/cyclonedds#485 will land in the very short term. This is all I could find on restricting the interface in a generic way: https://answers.ros.org/question/299940/explicitly-constraining-which-network-interfaces-are-used-in-ros2/

In the near term, the best solution seems to be restricting tests to checking stdout (this would also resolve #484 since the current tests seem like they would pass if the current output accidentally came out on stderr). If it's important to check stderr as well, I'm sure there are good solutions like running these on a well-defined environment and flagging unexpected output for manual review (similar to pytest warnings capture)

Fixes #480 The actual tests are the same, except with the use of launch_testing we ensure the CLI daemon is restarted between tests. This follows a similar pattern as the other ros2cli tests. I've left a TODO to test with all RMW implementations. I did not enable this now since I do not expect the tests to pass with all RMW implementations. See #482 Signed-off-by: Jacob Perron <jacob@openrobotics.org>

dirk-thomas · 2020-04-14T04:44:54Z

Proposals, any of which I think are reasonable:

Fix these tests to ignore irrelevant diagnostic output and only filter for the target output.

That is exactly what we are trying to achieve - as specific as possible. And as you said before yourself "Cyclone logging does not generally conform to an easily detected log pattern". So the only entity know what output is "irrelevant" is CycloneDDS.

Any other approach (by stream, by log level, whatever else) will unavoidably lead to other output also being ignore which is undesired. But this has been described and argued for above already.

Fix these tests to only look at stdout, which is where expected output should come from.

It has been mentioned multiple times already why it is considered not appropriate. Please see the previous comments from me as well as @hidmic.

Set up release testing to run in an environment with an unambiguous networking setup.

Configure cyclonedds to specify which networking interface to use. This is what the diagnostic message is trying to communicate.

These tests are run by developer locally, via Jenkins, in GitHub actions, and in environment I don't even know about. They need to pass by default with no need for custom configurations. Therefore I don't consider the requirement for a custom environment a viable option.

So from my point of view there is still not a single alternative proposal on the table which doesn't impair the scope of the existing tests.

rotu · 2020-04-14T09:50:00Z

So slightly reduce the scope of the existing tests or restrict them to controlled environments (they don't have to run everywhere). It's obvious that there are many #484 other #480 bugs #456 that #450 need attention, rather than trying to "fix" what ain't broken in rmw_cyclonedds.

dirk-thomas · 2020-04-14T16:25:13Z

Closing since ros2/rmw_cyclonedds#147 was merged.

Fixes #480 The actual tests are the same, except with the use of launch_testing we ensure the CLI daemon is restarted between tests. This follows a similar pattern as the other ros2cli tests. I've left a TODO to test with all RMW implementations. I did not enable this now since I do not expect the tests to pass with all RMW implementations. See #482 Signed-off-by: Jacob Perron <jacob@openrobotics.org>

dirk-thomas added the bug Something isn't working label Apr 11, 2020

rotu mentioned this issue Apr 11, 2020

Register RMW output filters ros2/rmw_cyclonedds#147

Merged

rotu mentioned this issue Apr 11, 2020

Reconsider default ip interface eclipse-cyclonedds/cyclonedds#485

Closed

rotu mentioned this issue Apr 13, 2020

ros2cli tests don't verify output stream #484

Open

jacobperron mentioned this issue Apr 13, 2020

[ros2param] Convert test_verb_dump into launch test #485

Merged

dirk-thomas closed this as completed Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failures on unexpected output #482

Test failures on unexpected output #482

rotu commented Apr 11, 2020

dirk-thomas commented Apr 11, 2020

rotu commented Apr 11, 2020

dirk-thomas commented Apr 11, 2020

rotu commented Apr 11, 2020 •

edited

Loading

rotu commented Apr 11, 2020

dirk-thomas commented Apr 11, 2020

rotu commented Apr 11, 2020

hidmic commented Apr 11, 2020

rotu commented Apr 11, 2020 •

edited

Loading

rotu commented Apr 11, 2020

hidmic commented Apr 13, 2020 •

edited

Loading

rotu commented Apr 13, 2020 •

edited

Loading

dirk-thomas commented Apr 14, 2020

rotu commented Apr 14, 2020

dirk-thomas commented Apr 14, 2020

Test failures on unexpected output #482

Test failures on unexpected output #482

Comments

rotu commented Apr 11, 2020

Bug report

dirk-thomas commented Apr 11, 2020

rotu commented Apr 11, 2020

dirk-thomas commented Apr 11, 2020

rotu commented Apr 11, 2020 • edited Loading

rotu commented Apr 11, 2020

dirk-thomas commented Apr 11, 2020

rotu commented Apr 11, 2020

hidmic commented Apr 11, 2020

rotu commented Apr 11, 2020 • edited Loading

rotu commented Apr 11, 2020

hidmic commented Apr 13, 2020 • edited Loading

rotu commented Apr 13, 2020 • edited Loading

dirk-thomas commented Apr 14, 2020

rotu commented Apr 14, 2020

dirk-thomas commented Apr 14, 2020

rotu commented Apr 11, 2020 •

edited

Loading

rotu commented Apr 11, 2020 •

edited

Loading

hidmic commented Apr 13, 2020 •

edited

Loading

rotu commented Apr 13, 2020 •

edited

Loading