Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes between 1.12.7 and 1.13.4 for backporting #1205

Merged
merged 57 commits into from
Nov 6, 2017

Conversation

dirk-thomas
Copy link
Member

@dirk-thomas dirk-thomas commented Oct 26, 2017

The following list of changes has been integrated into ros_comm 1.13.4 (Lunar) since the last Kinetic release (1.12.7).

Backported: (these changes are part of this PR)

Not backported:

@ros/ros_team @ros/ros_comm-maintainers Please comment on the decision which changes to (not) backport. Especially look at the changes marked with a "?".

dirk-thomas and others added 30 commits October 25, 2017 13:46
* refactor test_rosbag_storage

* fix rosbag::View::iterator copy assignment operator

the compiler-generated copy assignment operator did lead to segfault and
memory leaks.
…other thread (#1013)

* Avoid deleting XmlRpcClient's while they are being still in use on another thread

   * Acquire clients_mutex_ before deleting the clients
   * Remove the timed wait for the clients to become not in use
   * Only delete and erase clients that are not in use
   * Clients that would still be in use would delete themselves on release

* Wait for clients that are in use to finish in XmlRpcManager::shutdown
In a multimaster environment where a topic has multiple publishers,
when a node drops out abruptly (host is shutdown), a single subscriber update on
that topic will cause multiple threads to be created (one for each host) in order to
resolve the topic location. This cause a thread leak as host which are turned off
will not respond and when they come back online, the xmlrpc URI is changed causing a
connection refused error at the socket layer.

This fix catches the connection refused error and terminate the thread with the understanding
that if the connection is refused, the rosnode cannot be reached now or never. This effectively
prevents thread leak.

Note: if the remote host where the rosnode is thought to be never comes back up,
then the thread will still be leaked as the exception received is a host unreachable type.
This is intentional to avoid abruptly terminating the thread in case of a temporary DNS failure.
* Fix bug in transport_tcp

It assumes that the `connect` method of non-blocking scoket should return -1 and `last_socket_error()` should return `ROS_SOCKETS_ASYNCHRONOUS_CONNECT_RETURN`(=`EINPROGRESS`). 
But a non-blocking `connect` can return 0 when TCP connection to 127.0.0.1 (localhost).
[http://stackoverflow.com/questions/14027326/can-connect-return-0-with-non-blocing-socket](http://stackoverflow.com/questions/14027326/can-connect-return-0-with-non-blocing-socket)

* Modify code format

Modify code format
* Fix race condition that lead to miss first message

Callback queue waits for callback from "callOne" method.
When internal queue is not empty this last method succeeded even if id
info mapping does not contains related callback's id.
In this case, first callback (for one id) is never called since
"addCallback" method first push callback into internal queue and *then*
set info mapping.

So id info mapping has to be set before push callback into internal
queue. Otherwise first message could be ignored.

* Added test for addCallback race condition
* Add option to reset timer when time moved backwards

* refactor logic
* Added logging output when `roslogging` cannot change permissions

Added better differentiated logging output to `roslogging` so that
problems with permission is made clear to the user. Accompanying test
have also been added.

* Removed testing, updated warning message and fixed formatting

Removed testing since test folder should not be stored together with
tests. Since testing group permission requires intervention outside the
test harness the test it self is also removed.

Updated the warning message to include `WARNING` and updated to using
`%` formatting.
* Check /clock publication neatly in publishtest

- Use time.sleep because rospy.sleep(0.1) hangs if /clock is not published
- Add timeout for clock publication

* Add comment explaining use of time.sleep.

* Use logwarn_throttle not to flood the console
…o direct control flow hijacking (#1092)

* A fix to a critical stack buffer overflow vulnerability which leads to control flow hi-jacking.

* Much more simple fix for the stack overflow bug
…d a swap function instead (#1000)

* Made copying rosbag::Bag a compiler error to prevent crashes

* Added Bag::swap(Bag&) and rosbag::swap(Bag&, Bag&)

* Fixed bugs in Bag::swap

* Added tests for Bag::swap
After an update of gcc and glibc roscpp started to fail builds with the error:

    /home/rojkov/work/ros/build/tmp-glibc/work/i586-oe-linux/roscpp/1.11.21-r0/ros_comm-1.11.21/clients/roscpp/src/libros/transport/transport_udp.cpp:579:25: error: 'writev' was not declared in this scope
         ssize_t num_bytes = writev(sock_, iov, 2);
                             ^~~~~~

According to POSIX.1-2001 the function writev() is declared in sys/uio.h.

The patch includes the missing header for POSIX compliant systems.
* add SteadyTimer

based on SteadyTime (which uses the CLOCK_MONOTONIC).
This timer is not influenced by time jumps of the system time,
so ideal for things like periodic checks of timeout/heartbeat, etc...

* fix timer_manager to really use a steady clock when needed

This is a bit of a hack, since up to boost version 1.61 the time of the steady clock is always converted to system clock,
which is then used for the wait... which obviously doesn't help if we explicitly want the steady clock.

So as a workaround, include the backported versions of the boost condition variable if boost version is not recent enough.

* add tests for SteadyTimer

* [test] add -pthread to make some tests compile

not sure if this is only need in my case on ROS indigo...

* use SteadyTime for some timeouts

* add some checks to make sure the backported boost headers are included if needed

* specialize TimerManager threadFunc for SteadyTimer

to avoid the typeid check and make really sure the correct boost condition wait_until implementation is used

* Revert "[test] add -pthread to make some tests compile"

This reverts commit f62a3f2.

* set minimum version for rostime

* mostly spaces
* Add close_half_closed_sockets function

* Call close_half_closed_sockets in xmlrpcapi by default
Christopher Wecht and others added 7 commits October 26, 2017 11:54
* Don't try to set unknown socket options

These are not avaible on FreeBSD, for example

* individualize ifdefs

* fix whitespace
…() (#1191)

- Only triggered if reduce_overlap_ = true
- When iters_.size() == 1 and iters_.pop_back() gets called in the loop,
  the next loop condition check would read from iters_.back(), but
  iters_ would be empty by then.
* Test bzfile_ before reading/writing/closing

* Test lz4stream before reading/writing
* More agile demux.

Publishers in demux are no longer destroyed and recreated when switching, which results in much faster switching behavior. The previous version took even 10 seconds to start publishing on the newly selected topic (all on localhost).

Please, comment if you feel the default behavior should stay as the old was, and this new behavior should be triggered by a parameter.

* update style
@dirk-thomas
Copy link
Member Author

CI fails since the changes require a new release of roscpp_core. I ran a local prerelease with roscpp_core and ros_comm to convince myself that it is supposed to be "green".

@dirk-thomas
Copy link
Member Author

@ros/ros_team @ros/ros_comm-maintainers Waiting for your feedback...

@clalancette
Copy link
Contributor

I did a brief once over on all of the proposed code. Nothing in particular stuck out at me as very problematic, or API/ABI breaking. I also looked at the question marked ones in more detail:

Avoid deleting XmlRpcClient's while they are being still in use on another thread #1013

bug fix, chance for regression?

I took another look at commit 64920f2, and it seems OK to me. I agree with you that there is some scope for regression, but overall I think it is worthwhile.

using poll() in favor of select() in the XmlRPCDispatcher #1133

bug fix, risk of regressions?

This one is harder to quantify, since it essentially comes down to the behavior of select vs. poll in the kernel. With #1104 also being fixed by this PR, it may be a better idea to leave this one out, since it is a lot less likely that people will run into the original problem.

*[bug] fixes 1158 #1159

helpful improvement, chance for regression?

This one I'm concerned that people are relying on the current behavior without realizing it. While it is clearly incorrect, the next update they do to their Kinetic packages will break their robot in ways they don't understand. I'd leave this one out.

@dirk-thomas
Copy link
Member Author

dirk-thomas commented Nov 1, 2017

@clalancette Thank you for the review.

I agree that "using poll() in favor of select() in the XmlRPCDispatcher #1133" has a good potential of introducing regressions. Since it has been tested by a few parties for the last few month though and they would really like to see this fixed in an LTS release I am leaning towards taking the chances 😐 While #1104 also provides some improvements it won't solve all the problematic cases so the more invasive change is still necessary for some use cases.

I see your point on "[bug] fixes 1158 #1159". The problem is that without the fix other valid use cases are not possible (passing a parameter only to the first node but not to the second). Therefore I prefer to keep this for unblocking valid use cases over potentially breaking users relying on wrong API behavior.

@dirk-thomas dirk-thomas changed the title changes between 1.12.7 and 1.13.3 for backporting changes between 1.12.7 and 1.13.4 for backporting Nov 2, 2017
* catch exception with `socket.TCP_INFO` on WSL

fixes issue #1207
this only catches socket error 92

* Update util.py

* Update util.py

* avoid unnecessary changes, change import order
@dirk-thomas
Copy link
Member Author

@ros-pull-request-builder retest this please

@dirk-thomas
Copy link
Member Author

@ros-pull-request-builder retest this please

@dirk-thomas
Copy link
Member Author

The following change wasn't backported as part of this (or any future) backport PR:

improves lunar-devel jenkins test stability #1135

not backporting improvements

The change has been backported separately in #1578.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.