-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing checks for rmw handle in rclpy_create_publisher #826
Comments
could you elaborate a bit, i am not sure if i understand correctly. seems like you faced an actual problem? if that so, could you provide reproducible test sample code that can causes this problem? are you suggesting that we should add NULL check for rclpy/rclpy/rclpy/publisher.py Lines 57 to 58 in f2cb25b
|
Hi @fujitatomoya , thanks for the reply. I'm suggesting that in The core dump I posted above is from a racy environment, in which the rmw handle was overwritten by NULL before |
NULL check would be okay to add, but is that really enough to avoid the problem you are describing? after NULL check, i think there is still racy condition and chance which could be NULL with multi-threaded program. that is why it raises exception to user space, if i am not mistaken. |
Right, the race condition won't necessarily be circumvented even with the NULL check. However the point is, with the NULL check added to the right spot, the race will be called out immediately and the debugging will get much easier as we won't need to back-trace all the way from the place where the null ptr is dereferenced, which can be pretty far from the fault site. |
I am okay to add NULL check, would you mind considering PR? |
Sure thing. Will open a PR and let you know! Thanks. |
I opened PR #851 for this. |
Required Info:
Feature request
Hi, this issue is in the gray area between bug report and feature request.
When an application creates a publisher through
rclcpp
, it invokes the followingrcl
APIs:rcl_get_zero_initialized_publisher
to get a handle,rcl_publisher_init
to initialize the publisher,rcl_publisher_get_rmw_handle
followed by a NULL check to make sure there really isrmw_handle
.However, the last check is missing in
rclpy_create_publisher
ofrclpy
. In other words,rclpy
may think it successfully created a publisher even whenrmw_handle
is asynchronously set to NULL.I did find cases when this becomes problematic. Consider an
rclpy
application that creates a publisher and publishes messages. For example:rcl_publisher_init
, if for any reason,publisher->impl->rmw_handle
becomes NULL, that error goes undetected back in thercl_create_publisher
.rcl_crate_publisher
returnspublisher_capsule
back tonode.py
.rclpy.publisher.Publisher
instance.Publisher
instance, aQoSEventHandler
object is created, which internally calls_rclpy.rclpy_create_event
(in rclpy qos_event.py) ->rcl_publisher_event_init
(in rclpy _rclpy_qos_event.c) ->rmw_publisher_event_init
(in rcl event.c) ->rmw_publisher_event_init
(rmw_implementation) ->rmw_publisher_event_init
(rmw_fastrtps). There, it segfaults when de-referncing a null pointer.Core dump:
The effect of a missing pointer is silently manifested at a location that is far away from the fault site, making the debugging tricky.
Such issue could've prevented by sanity-checking
rcl_publisher_get_rmw_handle
like howrclcpp
does. Therclcpp
application does not suffer from the same issue, as the missing rmw handle is caught right away.P.S. The documentation for
rcl
states the following aboutrcl_publisher_get_rmw_handle
:Any thoughts?
Thanks!
The text was updated successfully, but these errors were encountered: