Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures on unexpected output #482

Closed
rotu opened this issue Apr 11, 2020 · 15 comments
Closed

Test failures on unexpected output #482

rotu opened this issue Apr 11, 2020 · 15 comments
Labels
bug Something isn't working

Comments

@rotu
Copy link

rotu commented Apr 11, 2020

Bug report

As originally mentioned in ros2/rmw_cyclonedds#147 by @hidmic:

test failures as seen in https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/.

CI up to ros2topic, ros2action and ros2service:

  • Debug MacOS Build Status

This pull request registers regular expressions for output filtering in launch tests that use rmw_cyclonedds_cpp as their RMW implementation of choice. It aims to solve test failures as seen in https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/.

CI up to ros2topic, ros2action and ros2service:

Although not seen in the test failure messages, the failures are caused by unexpected output from Cyclone DDS. In particular, "using network interface en0 (udp/192.168.1.120) selected arbitrarily from: en0, en1". This is followed by the output the program expects.

@dirk-thomas
Copy link
Member

ros2/rmw_cyclonedds#147 contains the proposal to resolve these test failures.

@dirk-thomas dirk-thomas added the bug Something isn't working label Apr 11, 2020
@rotu
Copy link
Author

rotu commented Apr 11, 2020

Proposed fix in ros2/rmw_cyclonedds#147

@hidmic:

This pull request registers regular expressions for output filtering in launch tests that use rmw_cyclonedds_cpp as their RMW implementation of choice. It aims to solve test failures as seen in https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1588/testReport/.

CI up to ros2topic, ros2action and ros2service:

  • Debug MacOS Build Status

This fix narrowly targets that one specific category of output message.

@dirk-thomas
Copy link
Member

fix narrowly targets that one specific category of output message.

The fix applies the same patterns already present in various other tests / RMW implementations.

@rotu
Copy link
Author

rotu commented Apr 11, 2020

@rotu:

@dirk-thomas a working solution is to remove the strict flag in the calls to expect_output. There is no guarantee that this is the only output from Cyclone - and putting human-readable warning messages on stderr is a feature, not a bug.

@dirk-thomas:

Since you didn't want to continue the conversation on the PR you opened as a replacement to this I will repeat my comment from there here:

These CLI tests checks for strict output for a reason. They not only ensure that certain output is present but also ensure that no other output like warnings / errors / wrong information is present. So no, strict=False is not an appropriate solution.

@rotu
Copy link
Author

rotu commented Apr 11, 2020

Additional output can happen on stderr, especially when conditions are present that should warrant user concern, or other tools like UBSAN are in use, or simply that verbosity is turned up. Further the proposed fix is fragile since Cyclone logging does not generally conform to an easily detected log pattern.

Here are a few suggested fixes:

  1. Turn off the strict flag on check_output. This flag enables a full-text match instead of just searching for a substring of the process output.
  2. Only check stdout for the purpose of the test. That is where the expected non-diagnostic production of a command line tool should go.
  3. Somewhere in the process under test, before awaiting user output, specifically turn down rcutils and RMW log severity.

@dirk-thomas
Copy link
Member

Turn off the strict flag on check_output. This flag enables a full-text match instead of just searching for a substring of the process output.

Please read my previous comment why I don't think this is a feasible approach.

Only check stdout for the purpose of the test. That is where the expected non-diagnostic production of a command line tool should go.

That would also leave any other error messages on stderr undetected which is not desired. These tests want to make sure that users get the right output - and not arbitrary error messages.

Somewhere in the process under test, before awaiting user output, specifically turn down rcutils and RMW log severity.

I don't think that messages already suppressed using these RMW specific filters can be categorized by a log level. E.g. if just one of them is a warning that would imply the tests have to run with a threshold of error and therefore wouldn't catch if any real warnings would appear in the output. Those kind of manipulations in the test significantly reduce its usefulness.

@rotu
Copy link
Author

rotu commented Apr 11, 2020

I read your previous comment several times, and I think you're wrong: even if the program produces unexpected output (because the global logging verbosity is cranked up or whatnot), THESE TEST CASES SHOULD STILL PASS since the program is succeeding at the task being tested.

If there are other error conditions that we want to detect, we should have tests to detect them, not let these test cases turn into catch-all smoke tests. And certainly not have every RMW or plugin package create a whitelist of messages to ignore for launch_testing purposes.

If you need a way to filter RCL or RMW output by severity, it exists. Improve that, don't reinvent a new, half-baked and informal standard for classifying log messages.

@hidmic
Copy link
Contributor

hidmic commented Apr 11, 2020

Hmm, I think that some insight on the genesis of these tests may help clarify why they are the way they are. Before even launch_testing came into existence, periodically and for every release we would (and still do to a considerable extent) manually run these tools for every platform, rmw implementation and installation type combination. You can imagine the testing matrix is rather large, so to reduce the burden we came up with tests like these. ros2/ros2#739 tracked that initial work.

The problem with attempting to automate system level QA like this, is that the sheer amount of failure modes makes it unfeasible for a test to cover them all. We cannot replace a seasoned engineer's judgement on seemingly unusual logs, so we don't even try. That's why we aim for strict=True in as many places as possible. You're right that's not true for all: non-determinism doesn't help (will this action client get all goal feedback before it gets its result?) and we have to keep maintenance efforts at a reasonable level (maintaining a list of all interface types that exist in ros2.repos up-to-date is inconvenient at least). All contributions to get us closer to that ideal are much appreciated though.

So to address your concerns / thoughts:

even if the program produces unexpected output (because the global logging verbosity is cranked up or whatnot), THESE TEST CASES SHOULD STILL PASS

That'd defeat their purpose. Logging verbosity, implicitly or explicitly, is part of the test configuration.

If there are other error conditions that we want to detect, we should have tests to detect them, not let these test cases turn into catch-all smoke tests.

Yes, that's what these tests are, and no, we can't foresee all error conditions. As you correctly stated:

Additional output can happen on stderr, especially when conditions are present that should warrant user concern

and we do want to know about it.

If you need a way to filter RCL or RMW output by severity, it exists. Improve that, don't reinvent a new, half-baked and informal standard for classifying log messages.

ros2/rmw_cyclonedds#147 is not yet another general purpose log severity filtering mechanism. It is a conscious decision we (and hopefully rmw_cyclonedds_cpp maintainers as well) make on what we know for a fact we can safely ignore. And that's why it has to be that narrow.

Whether that log should even show or not in normal circumstances is a fair question, and I won't object if you guys want to drop it on your side. But I'd rather not suppress a whole output category to achieve that.

@rotu
Copy link
Author

rotu commented Apr 11, 2020

Your heart is certainly in the right place and I love automating previously manual stuff! The problem is that these build tests are used to verify that development is correct, not just that things are ready at release time. When CI reports a failure (and especially when that failure message is vague), the next step is to rerun the tests locally, with whatever tools are at hand. This includes sanitizers, increased verbosity, debuggers, and ad-hoc debug statements. These debug tools are important to understanding unfamiliar code and, as much as possible, should not cause tests to fail.

As for that particular message, I’m not sure that it should be ignored. The computer has two network interfaces and it is an alert to the user that Cyclone is not completely sure of its choice. (Maybe cyclone could be more confident in that choice or maybe it should document better how to configure which one to use; though that’s a different issue). If it’s important to get reproducible results, release testing should be done on an unambiguous network setup or it should inform cyclone which interface to use.

I’m sure @thomas-moulard will have feelings about this, though I don’t know what they will be.

@rotu
Copy link
Author

rotu commented Apr 11, 2020

Originally sent via email:

I strongly disfavor merging the pull request as-is. These messages are useful and actionable. They indicate an ambiguity in setup and we don’t want to encourage ignoring such messages with a whitelist. Either they are useful and and should stay and not be filtered out by any testing or interaction tools, or they are not useful and they should not be emitted by default.

Proposals, any of which I think are reasonable:

  1. Fix these tests to ignore irrelevant diagnostic output and only filter for the target output.
  2. Fix these tests to only look at stdout, which is where expected output should come from.
  3. Set up release testing to run in an environment with an unambiguous networking setup.
  4. Configure cyclonedds to specify which networking interface to use. This is what the diagnostic message is trying to communicate.

Note that 1 and 2 would also fix the spurious test failure when rerunning locally with increased verbosity or sanitizers like UBSAN.

@hidmic
Copy link
Contributor

hidmic commented Apr 13, 2020

I see two actionable outcomes of this discussion.

As for that particular message, I’m not sure that it should be ignored.

The reasons you've outlined for not ignoring that particular log are absolutely reasonable (and something that probably wouldn't have been brought up if these tests were as permissive as you propose). So I can agree on 4. i.e. to configure cyclonedds to specify which networking interface to use.. How likely is eclipse-cyclonedds/cyclonedds#485 to land in the short term? It also makes me wonder, is there an expected ROS 2-specific default behavior that other RMW implementations abide to and rmw_cyclonedds_cpp does not? Thoughts @dirk-thomas?

The problem is that these build tests are used to verify that development is correct, not just that things are ready at release time.

That is absolutely true as well, but the fact that these tests are impractical for development does not render them incorrect. To serve that specific purpose, we could have some form of build time or run time configuration to relax the expectations in these tests, and perhaps that of some CI jobs, allowing one set of tests to serve both development and release QA needs.

Something like 3. i.e. to set up release testing to run in an environment with an unambiguous networking setup., would indeed be nice, though I'm unsure about available infrastructure and man power to do that today.

@rotu
Copy link
Author

rotu commented Apr 13, 2020

I think it's unlikely that eclipse-cyclonedds/cyclonedds#485 will land in the very short term. This is all I could find on restricting the interface in a generic way: https://answers.ros.org/question/299940/explicitly-constraining-which-network-interfaces-are-used-in-ros2/

In the near term, the best solution seems to be restricting tests to checking stdout (this would also resolve #484 since the current tests seem like they would pass if the current output accidentally came out on stderr). If it's important to check stderr as well, I'm sure there are good solutions like running these on a well-defined environment and flagging unexpected output for manual review (similar to pytest warnings capture)

jacobperron added a commit that referenced this issue Apr 13, 2020
Fixes #480

The actual tests are the same, except with the use of launch_testing we ensure the CLI daemon
is restarted between tests. This follows a similar pattern as the other ros2cli tests.

I've left a TODO to test with all RMW implementations. I did not enable this now  since I do
not expect the tests to pass with all RMW implementations.
See #482

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@dirk-thomas
Copy link
Member

Proposals, any of which I think are reasonable:

  1. Fix these tests to ignore irrelevant diagnostic output and only filter for the target output.

That is exactly what we are trying to achieve - as specific as possible. And as you said before yourself "Cyclone logging does not generally conform to an easily detected log pattern". So the only entity know what output is "irrelevant" is CycloneDDS.

Any other approach (by stream, by log level, whatever else) will unavoidably lead to other output also being ignore which is undesired. But this has been described and argued for above already.

  1. Fix these tests to only look at stdout, which is where expected output should come from.

It has been mentioned multiple times already why it is considered not appropriate. Please see the previous comments from me as well as @hidmic.

  1. Set up release testing to run in an environment with an unambiguous networking setup.
  2. Configure cyclonedds to specify which networking interface to use. This is what the diagnostic message is trying to communicate.

These tests are run by developer locally, via Jenkins, in GitHub actions, and in environment I don't even know about. They need to pass by default with no need for custom configurations. Therefore I don't consider the requirement for a custom environment a viable option.

So from my point of view there is still not a single alternative proposal on the table which doesn't impair the scope of the existing tests.

@rotu
Copy link
Author

rotu commented Apr 14, 2020

So slightly reduce the scope of the existing tests or restrict them to controlled environments (they don't have to run everywhere). It's obvious that there are many #484 other #480 bugs #456 that #450 need attention, rather than trying to "fix" what ain't broken in rmw_cyclonedds.

@dirk-thomas
Copy link
Member

Closing since ros2/rmw_cyclonedds#147 was merged.

jacobperron added a commit that referenced this issue Apr 15, 2020
Fixes #480

The actual tests are the same, except with the use of launch_testing we ensure the CLI daemon
is restarted between tests. This follows a similar pattern as the other ros2cli tests.

I've left a TODO to test with all RMW implementations. I did not enable this now  since I do
not expect the tests to pass with all RMW implementations.
See #482

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
jacobperron added a commit that referenced this issue Apr 16, 2020
Fixes #480

The actual tests are the same, except with the use of launch_testing we ensure the CLI daemon
is restarted between tests. This follows a similar pattern as the other ros2cli tests.

I've left a TODO to test with all RMW implementations. I did not enable this now  since I do
not expect the tests to pass with all RMW implementations.
See #482

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants