Slow performance with prpy 0.4.0 #26

Stefanos19 · 2015-04-05T03:17:20Z

The two videos below run identical code, apart from the the first using prpy 0.3.1 and the second 0.4.0 For the first, the duration of grasp_bite - serve - look - ready-to-grasp is 25sec, for the second 49 secs https://www.youtube.com/watch?v=YkwINudZnOw https://www.youtube.com/watch?v=LbgmtMS4kpo
I took time stamps before and after each PlanToConfiguration and PlanToEndEffectorOffset call:

0.4.0	0.3.1.	Call
935486	11590	PlanToLookingAtFace
905313	330854	PlanToLookingAtPlate
362034	82758	PlanToEndEffectorPose
937087	702920	PlanToEndEffectorOffset
263716	783048	PlanToEndEffectorOffset
166682	586631	PlanToServingConfiguration
3570318	2497801	Total

It appears though that the actual delays in the video are much larger than the ones in the time stamps
@siddhss5 @mkoval @jeking04

mkoval · 2015-04-05T03:30:05Z

We recently fixed a but (see personalrobotics/prpy#79) that was causing environment cloning to leak a significant amount of memory. This could be the culprit if you're the demo in a loop in one process. It doesn't look like that commit made it into the 0.4.0 release. Can you try the latest master version of PrPy?

If that doesn't fix the problem, then you'll need to do some lower-level profiling to see what is actually the slow part. Is it planning or retiming/smoothing?

mkoval · 2015-04-05T03:31:25Z

BTW, here is some documentation for how to make a proper table on Github using Markdown. The original timing information was very difficult to read, so I edited your post accordingly. 😄

Stefanos19 · 2015-04-05T04:19:16Z

Btw, here are the results from the collision and kinematics benchmarks of or_benchmarks:

[CollisionCheckingBenchmark.cpp:214 RunCollisionBenchmark] [CollisionCheckingBenchmark] Results:
[CollisionCheckingBenchmark.cpp:215 RunCollisionBenchmark]  Total collision checks: 50000
[CollisionCheckingBenchmark.cpp:216 RunCollisionBenchmark]  Total collisions found: 0
[CollisionCheckingBenchmark.cpp:217 RunCollisionBenchmark]  Total elapsed time: 0.15509
[CollisionCheckingBenchmark.cpp:218 RunCollisionBenchmark]  Average time per check: 0.0000031
[CollisionCheckingBenchmark.cpp:219 RunCollisionBenchmark]  Average checks per second: 322401.76

[DataUtils.cpp:62 GenerateRandomConfigurations] [CollisionCheckingBenchmark] Generating 50000 poses
[KinematicBenchmarks.cpp:123 RunForwardKinematicsBenchmark] [KinematicBenchmarks] Results for FK benchmark:
[KinematicBenchmarks.cpp:124 RunForwardKinematicsBenchmark]     Total queries: 50000
[KinematicBenchmarks.cpp:125 RunForwardKinematicsBenchmark]     Total elapsed time: 0.25551
[KinematicBenchmarks.cpp:126 RunForwardKinematicsBenchmark]     Average time per check: 0.0000051
[KinematicBenchmarks.cpp:127 RunForwardKinematicsBenchmark]     Average checks per second: 195684.76
[KinematicBenchmarks.cpp:128 RunForwardKinematicsBenchmark]     Average calls to SetDOFValues per second: 197768.39
[KinematicBenchmarks.cpp:129 RunForwardKinematicsBenchmark]     Average calls to GetEndEffectorTransform per second: 18573550.40
[INFO] [rospy.core:core.py:387]:signal_shutdown: signal_shutdown [atexit]
[BenchmarksModule.cpp:62 DestroyPlugin] [CollisionCheckingModule] Destroying plugin.

mkoval · 2015-04-05T04:21:47Z

Those results don't mean much because there is (likely) nothing in the environment. It would be more informative to run the self-collision benchmark. I believe the run_all script runs all three benchmarks (environment collision, self collision, and kinematics).

Stefanos19 · 2015-04-05T04:24:43Z

Hmm the run_all script relies on some datasets that appear to have been generated for Herb:
https://github.com/personalrobotics/or_benchmarks/tree/master/datasets
Running them for Ada creates some joint-limits errors .

Thanks for the reference on git-tables :)

Stefanos19 · 2015-04-05T04:26:52Z

Also, checking out the latest prpy results in the following error:

environment-core.h:221 Destroy] destroy module
Traceback (most recent call last):
  File "/homes/snikolai/catkin_demo040/src/ada_meal_scenario/src/bite_serving_FSM_timeTest.py", line 280, in <module>
    adaManipulationTester.initSimple()
  File "/homes/snikolai/catkin_demo040/src/ada_meal_scenario/src/bite_serving_FSM_timeTest.py", line 162, in initSimple
    self.manip.PlanToConfiguration(lookingAtPlateConfiguration)
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/base/manipulator.py", line 64, in wrapper_method
    return self._PlanWrapper(delegate_method, args, kw_args)
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/base/manipulator.py", line 156, in _PlanWrapper
    cloned_env.Cloned(self).SetActive()
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/bind.py", line 93, in intercept
    clone_method(canonical_parent)
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/base/mico.py", line 129, in CloneBindings
    self.__init__(True, None)
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/base/mico.py", line 90, in __init__
    dof_indices=self.GetArmIndices(), affine_dofs=0, simulated=sim)        
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/base/robot.py", line 155, in AttachController
    self.multicontroller.AttachController(delegate_controller, dof_indices, affine_dofs)
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/bind.py", line 113, in intercept
    return canonical_instance.__getattr__(name)
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/base/robot.py", line 101, in __getattr__
    elif self.actions.has_action(name):
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/bind.py", line 113, in intercept
    return canonical_instance.__getattr__(name)
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/base/robot.py", line 101, in __getattr__
    elif self.actions.has_action(name):
.
.
.
.
  File "/homes/snikolai/catkin_demo040/src/prpy/src/prpy/bind.py", line 51, in intercept
    canonical_instance = InstanceDeduplicator.get_canonical(self)
RuntimeError: maximum recursion depth exceeded
[INFO] [rospy.core:core.py:387]:signal_shutdown: signal_shutdown [atexit]

mkoval · 2015-04-05T04:27:30Z

In that case, you can just run the standalone run_collision_benchmark.py script with the --self option.

@jeking04 Could you take a look at making run_collision_benchmark.py less HERB-specific? It would be handy to run this script on other robots, too.

mkoval · 2015-04-05T04:29:26Z

@Stefanos19 You should file an issue on PrPy. Even if the error is in AdaPy, there is no excuse for PrPy throwing a maximum recursion depth exceeded exception with a completely opaque stack trace.

Stefanos19 · 2015-04-05T04:30:14Z

Here is the output of run_collision_benchmark with the --self option:

[BenchmarksModule.cpp:40 CreateInterfaceValidated] [CollisionCheckingModule] Creating interface.
[BenchmarksModule.cpp:42 CreateInterfaceValidated] [CollisionCheckingModule] Generating collision checking benchmark plugin.
[CollisionCheckingBenchmark.cpp:36 RunBenchmark] [CollisionCheckingBenchmark] Running collision checking for body ADA
[CollisionCheckingBenchmark.cpp:60 RunBenchmark] [CollisionCheckingBenchmark] Sampling poses from cube with extent 2.000
[DataUtils.cpp:62 GenerateRandomConfigurations] [CollisionCheckingBenchmark] Generating 50000 poses
[CollisionCheckingBenchmark.cpp:281 RunSelfCollisionBenchmark] [CollisionCheckingBenchmark] Results:
[CollisionCheckingBenchmark.cpp:282 RunSelfCollisionBenchmark]  Total self collision checks: 50000
[CollisionCheckingBenchmark.cpp:283 RunSelfCollisionBenchmark]  Total collisions found: 2156
[CollisionCheckingBenchmark.cpp:284 RunSelfCollisionBenchmark]  Total elapsed time: 16.94463
[CollisionCheckingBenchmark.cpp:285 RunSelfCollisionBenchmark]  Average time per check: 0.0003389
[CollisionCheckingBenchmark.cpp:286 RunSelfCollisionBenchmark]  Average checks per second: 2950.79

Stefanos19 · 2015-04-05T04:32:03Z

@mkoval ok, done.

mkoval · 2015-04-05T05:19:28Z

That is 2950.79 self-collision checks per second. This is quite a bit faster than HERB using ODE.

It's hard to draw any concrete conclusion from that, since this is partially expected:

much more of HERB's configuration space is in self-collision
- on HERB, 50% of the C-space is in self-collision per arm and over 75% for bimanual manipulation
- on ADA, only 4% of sampled poses (5156 / 50000) is in self-collision
- it's much more expensive to do a positive check (or a near miss) than a check in free space
ADA has many fewer links to collision checks

We may want to try the latest version of PrPy (once personalrobotics/prpy#89 is fixed) and do some more profiling of the demo script before putting too much time into simplifying the model.

cdellin · 2015-04-05T05:51:15Z

it's much more expensive to do a positive check (or a near miss) than a check in free space

This may be true, but sometimes the opposite can be true for a different reason -- the checker can return False on the first collision it finds, whereas if the configuration is valid, it must finish checking all links before it returns True. I have no idea what the tradeoff is for the Ada model, though.

mkoval · 2015-04-05T06:38:00Z

@cdellin Yes, that is definitely true. The worst case is definitely a near-miss, but it's not clear how "in collision" versus "not in collision" compare.

Stefanos19 · 2015-04-06T03:04:49Z

I ran the bite-serving code with the latest version of prpy. It is slightly faster than 0.4.0 (a bite cycle takes on average 37 secs with the latest master vs 41 secs for the 0.4.0), but it is still much slower than the prpy 0.3.1 (26 secs).

mkoval · 2015-04-06T03:29:28Z

There shouldn't be any changes in PrPy between 0.3.1 and master that can account for that type of performance difference. The difference must be in which planners or smoothers are being called (or an inadvertent change in one of them).

Looking at your table of times, it looks like the big increase in time was split between PlanToLookingAtFace, PlanToLookingAtPlate, and PlanToEndEffectorPose. I can't find these calls in the demo script. Which planning method are you calling in each of these cases (e.g. PlanToConfiguration, PlanToEndEffectorPose, PlanToTSR, etc)? Also, do those times include execution time?

mkoval · 2015-04-06T03:39:09Z

@Stefanos19 Re: the PlanToTSR error in CBiRRT, it looks like tsr_chains is None. This shouldn't happen in normal operation - is it possible that you're passing an invalid argument?

Stefanos19 · 2015-04-06T03:41:10Z

I uploaded the file that I used for testing.
https://github.com/personalrobotics/ada_meal_scenario/blob/master/src/bite_serving_FSM_timeTest.py
Note that this file does not have the latest changes (removing the configuration definitions etc,), but I use it for testing since it does not have any cached trajectories.

In all tests I used only the cbirrt planner for everything, as well as the openravepy smoother, for consistency. To ensure that only the cbirrt planner is called, I changed the Sequence in adarobot.py, as follows:

        actual_planner = Sequence(
            self.birrt_planner,
            MethodMask(
                FirstSupported(
                    self.cbirrt_planner,
                ),
                methods=['PlanToIK', 'PlanToTSR', 'PlanToEndEffectorPose', 'PlanToEndEffectorOffset']
            )
        )
        self.planner = FirstSupported(
            actual_planner,
            # Special purpose meta-planner.
         )

For the prpy 0.3.1 test, I also changed the sequence in adarobot.py so that
self.robot.planner = prpy.planning.Sequence(self.robot.cbirrt_planner)

The times are the times from the first PlanToEndEffectorOffset, when the robot starts to go grab a bite, until it reaches the same point again, after it serves the bite to the person.

Stefanos19 · 2015-04-06T03:44:25Z

@mkoval On the PlanToTSR error, I found the bug and deleted the comment, but it was too late :) Thanks!!

mkoval · 2015-04-06T03:47:34Z

It's hard to make much of those numbers. Could you:

Measure the time spent in each planning call with execute=False, e.g.

with prpy.util.Timer('PlaningToXXX'):
    path = robot.PlanToXXX(..., execute=False)

Separately measure execution time:

with prpy.util.Timer('MovingToXXX'):
    robot.ExecutePath(path)

Also, I would expect CBiRRT to be very slow with PlanToEndEffectorOffset. VectorFieldPlanner, GreedyIkPlanner, or even MKPlanner should be much faster. Are you sure you were exclusively using CBiRRT in 0.3?

mkoval · 2015-04-06T03:48:12Z

@Stefanos19 Was the PlanToTSR error in your code or in PrPy?

Stefanos19 · 2015-04-06T03:48:57Z

@mkoval It was in my code, I used the "old" sequence formatting which caused the error.
Yeah, I am positive. VectorFieldPlanner and GreedyIKPlanner are not supported in 0.3.1
Sure, I will do the tests that you mention.

mkoval · 2015-04-06T03:52:12Z

@Stefanos19 What is the old sequence formatting? I don't think Sequence has changed in ages.

mkoval · 2015-04-06T03:55:36Z

In general, we need to figure out which specific function call(s) in PrPy got slower in master. The only slow operations are really planning, smoothing, retiming, and trajectory execution. So it's just a matter of timing all of those calls and figuring out where all the time is going.

Stefanos19 · 2015-04-06T03:56:20Z

Sorry, I did not phrase this well. I meant that in 0.3.1, I did:
self.planner = Sequence(self.cbirrt_planner)
But the way we set the planner in the master prpy now is slightly different. Doing the above caused the error
( that was only for the testing, since I want the same planner in both conditions)

mkoval · 2015-04-06T03:59:36Z

That should definitely not cause an error. This most likely means that CBiRRT is always failing because of a syntax error and another planner is picking up the slack. Can you file an issue on PrPy?

BTW: There is no reason to use Sequence if there is only one planner. Just set robot.planner = self.cbirrt_planner. See the PrPy documentation for more information.

Stefanos19 · 2015-04-06T04:02:05Z

Ok, I will replicate the error to make sure that this was it. It is possible that it was something else.
I will repeat the tests by setting robot.planner = self.cbirrt_planner
Still, if the robot plans and the only planner in the sequence is cbirrt, then there should not be any other planner picking up, right?

mkoval · 2015-04-06T04:03:36Z

Yes, that is correct. But self.cbirrt_planner and Sequence(self.cbirrt_planner) are also definitely identical. So there is something else going on here (maybe robot.planner is being set elsewhere)?

Stefanos19 · 2015-04-06T04:05:43Z

Now that I think about it, it is possible that the TSR error was caused because I did not set up the robot.planner at all.
Note that this TSR error has nothing to do with the delay, as when that error occurred the planner failed and the program exited. The discussion with the delays using the latest prpy is after I fixed the error by settting the Sequence as

 actual_planner = Sequence(
            self.birrt_planner,
            MethodMask(
                FirstSupported(
                    self.cbirrt_planner,
                ),
                methods=['PlanToIK', 'PlanToTSR', 'PlanToEndEffectorPose', 'PlanToEndEffectorOffset']
            )
        )
        self.planner = FirstSupported(
            actual_planner,
            # Special purpose meta-planner.
         )

mkoval · 2015-04-06T04:07:53Z

In any case, it is a bug if CBiRRT is crashing with that error. You should report it on PrPy. I also don't see why that sequence would fix anything - CBiRRT is the only planner in the list that implements PlanToTSR (which seems to be the source of the error).

BTW: That list is equivalent to:

self.planner = FirstSupported(
    self.birrt_planner,
    self.cbirrt_planner
)

Stefanos19 · 2015-04-06T04:09:40Z

Got it. I will replicate the error and file the bug, and redo the comparison using self.planner = self.cbirrt_planner in both and with the time-tests that you have above.

mkoval · 2015-04-06T04:11:07Z

Make sure you get timing information for the individual planning calls. We really do need to figure out what part of PrPy got slower.

Stefanos19 · 2015-04-06T04:11:24Z

Right.

mkoval · 2015-04-06T04:20:22Z

I think I fixed the CBiRRT bug here: personalrobotics/prpy@e7b48b8

Stefanos19 · 2015-04-06T22:24:12Z

That's interesting: Replacing

self.manip.PlanToConfiguration(servingConfiguration)

with

path = self.robot.PlanToConfiguration(servingConfiguration, execute=False)
self.robot.ExecutePath(path)

solved the timing issue. Aren't the two identical?

mkoval · 2015-04-06T22:39:02Z

They should be identical. Passing execute=False just calls ExecutePath (see here). This shortcuts, smooths, and times the path and calls ExecuteTrajectory on the resulting trajectory (see here).

I don't know why there is a difference if you call it manually.

I am a bit surprised that this works at all. There is a bunch of HERB-specific code in the MicoRobot implementation of ExecuteTrajectory. I've been cleaning this up in personalrobotics/prpy#97.

mkoval · 2015-04-06T22:40:49Z

BTW: Your first snippet calls the planner on self.manip and the second calls it on self.robot. This will only work if the manipulator DOFs happen to be active (and in the correct order!).

siddhss5 · 2015-04-07T00:08:31Z

What?! How does manip even have any of those planning methods? Aren't they all inherited from robot base?

mkoval · 2015-04-07T01:19:57Z

They're magically added to Manipulator using __getattr__ and __dir__. This is the same mechanism we use to add planning methods to Robot. Here is the relevant code in Robot and in Manipulator.

The only difference is that calling a planning method on a manipulator first sets itself as active.

siddhss5 · 2015-04-07T01:24:34Z

Aha. So what is with this slowdown then?

mkoval · 2015-04-13T00:53:54Z

Is this fixed? What was the problem?

Stefanos19 · 2015-04-13T01:09:47Z

I am not sure what the problem was. I no longer see the slowdown between calling path = PlanToConfiguration(config, execute=False), ExecutePath(path), and PlanToConfiguration(config). It might have been the ancient or_ompl version I used, or the smoothing/shortcutting parameters.

Stefanos19 added bug help wanted labels Apr 5, 2015

Stefanos19 self-assigned this Apr 5, 2015

Stefanos19 added this to the National Robotics Week milestone Apr 5, 2015

Stefanos19 referenced this issue Apr 5, 2015

now some trajectories are cached. still planning is excruciatingly slow

022e294

Stefanos19 closed this as completed Apr 13, 2015

Slow performance with prpy 0.4.0 #26

Slow performance with prpy 0.4.0 #26

Comments

Stefanos19 commented Apr 5, 2015

mkoval commented Apr 5, 2015

mkoval commented Apr 5, 2015

Stefanos19 commented Apr 5, 2015

mkoval commented Apr 5, 2015

Stefanos19 commented Apr 5, 2015

Stefanos19 commented Apr 5, 2015

mkoval commented Apr 5, 2015

mkoval commented Apr 5, 2015

Stefanos19 commented Apr 5, 2015

Stefanos19 commented Apr 5, 2015

mkoval commented Apr 5, 2015

cdellin commented Apr 5, 2015

mkoval commented Apr 5, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

mkoval commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

mkoval commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

mkoval commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

Stefanos19 commented Apr 6, 2015

mkoval commented Apr 6, 2015

mkoval commented Apr 6, 2015

siddhss5 commented Apr 7, 2015

mkoval commented Apr 7, 2015

siddhss5 commented Apr 7, 2015

mkoval commented Apr 13, 2015

Stefanos19 commented Apr 13, 2015