Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command gear torque A3/A2 error when using kuka_rsi_hw_interface #126

Open
RhysMcK opened this issue Apr 17, 2018 · 40 comments
Open

Command gear torque A3/A2 error when using kuka_rsi_hw_interface #126

RhysMcK opened this issue Apr 17, 2018 · 40 comments

Comments

@RhysMcK
Copy link

RhysMcK commented Apr 17, 2018

KUKA System:

Controller- KRC4 compact
Robot- Kr3-540 Agilus

Environment

ROS Distro: Kinetic
Moveit

Problem

Hi all,
I have been struggling with this issue for a while now. While controlling the kuka robot through the RSI hardware interface , every now and again i will receive a command gear torque error, typically for axis A3 or A2. This is an issue which has been reported before (#89) however none of the comments have solved the problem for me personally. I don't believe this to be a communication issue, as this happens when i have not lost any communication packets ( according to the kuka RSI monitor). I have played with the HOLD_ON parameter and that does not seem to make any difference.

A possible suggestion which was bought up in #89 was the tool load data not being set correctly. It is possible that this could be the cause, as i must admit i have not set the payload 100% accurately ( due to not having any cad program currently available to me). I have however entered the mass and estimated the location of CoM and moment of inertia's. However, the reason i don't believe this to be at fault is that i have played with big and small values for the mass/ inertias and have seen no difference in the behavior. I would have thought that setting values significantly far away from the true values would cause this error to happen instantly or more frequently. To me it doesn't seem as if this data has any effect. But i could be very wrong.

A more likely possibility in my opinion is that the motion plans from moveit are not abiding by the acceleration limits, as previously mention in moveit/moveit#416. I will try and get some trajectory plots in the next day or so but in the mean time i just wanted to get this discussion going again and see if people had any other insights into this issue.

Cheers!

@RhysMcK
Copy link
Author

RhysMcK commented Apr 18, 2018

Update: This issue persists after i have physically removed the tool from the robot arm. Which surely must indicate that this is not an issue with inaccurate tool load data.

@RhysMcK
Copy link
Author

RhysMcK commented Apr 19, 2018

I have had a chance to do some further testing on this issue. It seems the problem is most likely to do with the trajectories that are being sent to the KUKA robot. I recorded all of the position commands that were being set from within the KukaHardwareInterface::write function. From this data i was then able to extrapolate the commanding velocities by taking the difference between the last two commanding positions and assuming cycle time of 12ms. From the figures below, it can be seen that just before i receive this error on A2 there is a noticeable velocity spike.
Trial 1:
trail1_a2
Trial 2:
trail2_a2
Trial 3:
trail3_a2

The Moveit! motion planners definitely seem to be generating non-ideal trajectories, which leads to this issue. There has been discussion previously about the OMPL library (which i am using), potentially causing this issue on Kinetic (moveit/moveit#416). However i'm not sure if this issue has been further discussed/fixed.

A simple, first attempt fix might be to just adjust the command position if the change in velocity ( acceleration) is too great, i.e set a threshold for the acceleration. A very basic moving average filter could be used to determine the command position if this limit is exceeded. I will try this and see how it goes. Any further ideas on this issue would be greatly appreciated it!

Note that adjusting your payload data on the kuka controller will effect when you will receive this error, as the kuka motion planner is looking at the commanded motion, relative to the current inertial conditions, and predicting that the commanded motion will exceed torque limits. $LOAD is taken into account when making these predictions [1]. However, this does not really solve the root of the problem.

Cheers!

[1] https://www.robot-forum.com/robotforum/kuka-robot-forum/command-gear-torque-a23-with-rsi-control/

@gavanderhoorn
Copy link
Member

Hi. Thanks for the detailed description.

I believe your analysis could be true: the OMPL version used in Kinetic MoveIt is known to generate (sometimes) discontinous paths, leading to time parameterisation artefacts such as the one you've plotted.

Unfortunately there isn't much we can do about that, other than implementing something like the filter you mention. I'm not entirely sure I'd be willing to do that in the driver though: perhaps an intermediate node should do the filtering before the trajectory gets passed to the driver (the driver (actually: RSI) just requires smooth trajectories, it is the producer's responsibility to provide those).

moveit/moveit#416 has not been resolved, and in fact, the linked issue on the OMPL bitbucket repository (ompl/ompl#309) is also still open. Perhaps commenting on that might help get this some more attention.

In the meantime MoveIt has also seen some changes to the time parameterisation components. moveit/moveit#441 got merged into Kinetic. The interplay with the OMPL issue is not clear to me, but perhaps one is related to the other.

An interesting PR that has not been merged yet could be moveit/moveit#809: this adds a parameterisation plugin based on TOTG. In my experience this produces rather impressively smooth trajectories, but I'm again not sure how the OMPL issue influences this.

From this data i was then able to extrapolate the commanding velocities

You're probably aware, but the driver only uses position control. Any velocities are only the distance travelled in one interpolation period.

@gavanderhoorn
Copy link
Member

@rtonnaer

@gavanderhoorn
Copy link
Member

Unfortunately there isn't much we can do about that, other than implementing something like the filter you mention. I'm not entirely sure I'd be willing to do that in the driver though: perhaps an intermediate node should do the filtering before the trajectory gets passed to the driver (the driver (actually: RSI) just requires smooth trajectories, it is the producer's responsibility to provide those).

I think I'll change my opinion on this: I would like to add this to the driver, but not to filter (ie: change) the trajectory, but to check it before accepting it for execution.

That way, we can at least avoid executing trajectories that will (probably) lead to problems some time later after we've started executing them.

A complicating factor is that we only provide a ros_control hardware_interface, and do not control any of the infrastructure that accepts and processes the trajectories. That would be joint_trajectory_controller.

Note: the JTC actually does the interpolation of the received trajectory. MoveIt et al influence the initial parameterisation, but the JTC is ultimately the one that interpolates the trajectories and sends the values to the hardware_interface.

@RhysMcK
Copy link
Author

RhysMcK commented Apr 22, 2018

Possibly a more robust solution could be to also add a RSI FILTER object on the Kuka side. I have not experimented with this before and the documentation is a bit scarce, but it is possible to implement a bunch of standard filters (butterworth, low-pass, high-pass, etc.). A simple low-pass filter should help omit any random jerks and inconsistency outputted from the JTC.

@cschindlbeck
Copy link
Contributor

So apparently this issue has been solved, see moveit/moveit#416 moveit/moveit#869

Can you confirm that this solves your problem?

@gavanderhoorn
Copy link
Member

@cschindlbeck: I'm not sure it has been really solved. The MoveIt changes are related, but do not address the cause afaict. Changes in OMPL are needed as well, and until those are released into Kinetic the situation as-is remains.

See also the bitbucket issue.

@gavanderhoorn
Copy link
Member

See moveit/moveit#416 (comment) if you can't wait :)

@RhysMcK
Copy link
Author

RhysMcK commented May 3, 2018

Thanks @gavanderhoorn, i've been keeping a close eye on that thread!.... I'll try this out and see if it makes any improvements. Although i now believe the weird velocity spikes seen the plots above is more likely due to the current default method of time parameterization used in moveit. I'm eager to try either TOPP and/or TOTG when its ready for kinetic :) moveit/moveit#809

@gavanderhoorn
Copy link
Member

Getting moveit/moveit#809 to work on Kinetic should not too difficult. It's fairly stand-alone. You would just need to add the plugin and update your MoveIt config to load that parameterisation adapter instead of the default one (it's configured in the OMPL planning adapter section).

@gavanderhoorn
Copy link
Member

@RhysMcK: have you had any opportunity to test the fixes in OMPL+MoveIt?

@RhysMcK
Copy link
Author

RhysMcK commented May 15, 2018

Haven't had a chance yet @gavanderhoorn. This is top of my todo list though, hopefully i get around to it in the next couple days.

@RhysMcK
Copy link
Author

RhysMcK commented May 18, 2018

Had a chance to test today. Unfortunately the problem still persists after using the latest OMPL + Moveit updates. I expect this problem as i stated above is most likely due to the current method of time parameterization. Which i will confirm once i get a spare moment!

plot_after_update

Despite the velocity spikes between consequence points, the latest updates definitely remove some jerkiness to the planned paths. In generally, the trajectories are a lot smoother from a "macro-view". i.e less zero-crossings.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented May 18, 2018

@RhysMcK wrote:

Unfortunately the problem still persists after using the latest OMPL + Moveit updates.

just making sure: you compiled everything from source, correct? There hasn't been an updated release yet.

I expect this problem as i stated above is most likely due to the current method of time parameterization. Which i will confirm once i get a spare moment!

I would suggest to try and use moveit/moveit#809 to see whether that improves things.

One thing we keep ignoring in this context (or at I believe we've not given it much attention) is that ros_control's joint_trajectory_controller also interpolates. The data you're plotting is what comes out of that, not MoveIt itself. It could be that the two agree with each other, but would still be good to check.

@RhysMcK
Copy link
Author

RhysMcK commented May 18, 2018

Not everything, I have Moveit! installed from source, and i am using the debian binary for OMPL, which as far as i can see is up-to-date and includes the above mentioned fixes...Or am i wrong?

yep, moveit/moveit#809 is my next step :)

And yes, good point about the joint_trajectory_controller interpolation... I will investigate this.

@gavanderhoorn
Copy link
Member

ros/rosdistro#17747 was merged two weeks ago. Kinetic sync was yesterday so you'll only have OMPL updated if you updated your ROS pkgs just now (or at least: after the sync).

@RhysMcK
Copy link
Author

RhysMcK commented May 18, 2018

I did update my ROS packages, so i am using the latest ompl! Thanks for clarifying

@RhysMcK
Copy link
Author

RhysMcK commented May 18, 2018

sigh.....

screenshot from 2018-05-18 20-35-37

This is TOTG (moveit/moveit#809 ) implemented in the kinetic/devel branch. Admittedly, the profiles on general look smother... but still, the bug persists.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented May 18, 2018

But this is still taken from the hardware_interface, right?

If so, we're looking at JTC output, which is influenced by TOTG/IPTP, but does not have a 1-to-1 relationship with it.

I would be really interested in a plot of velocities coming out of MoveIt (that is hard(er) to get hold of, but would help diagnosing whether this is still a MoveIt/OMPL/IPTP/TOTG issue, or is in JTC).

@gavanderhoorn
Copy link
Member

Also: I'm assuming you're using a real-time kernel and that the driver/hardware_interface has real-time priority. If not, then scheduling could be an issue here.

@RhysMcK
Copy link
Author

RhysMcK commented May 18, 2018

But this is still taken from the hardware_interface, right?

Yep, thats correct

I would be really interested in a plot of velocities coming out of the MoveIt (that is hard(er) to get hold of, but would help diagnosing whether this is still a MoveIt/OMPL/IPTP/TOTG issue, or is in JTC).

Yes i agree, I will attempt to do this ASAP.

Also: I'm assuming you're using a real-time kernel and that the driver/hardware_interface has real-time priority. If not, then scheduling could be an issue here.

I am not at the moment, however i have previously... and this problem still persisted. I believe the need for a RT / low latency kernel would only really be an issue if i was experiencing communication issues, i.e packet losses. According to the Diagnostic monitor i always have connection quality of 100. With no more than about 1 or 2 contiguous packet loss.

@cschindlbeck
Copy link
Contributor

cschindlbeck commented May 23, 2018

I switched to moveit from source and upgraded the ROS packages via apt-get.
I am a little bit confused, i still see libompl.so.1.2.1 in /opt/ros/kinetic/lib/x86_64-linux-gnu. Shouldn't that be
libompl.so.1.2.3?

@RhysMcK Can you tell me how you obtained the velocity profiles? I might be interested in recording these by myself. Is this a numerically differentiated ROS topic?

@cschindlbeck
Copy link
Contributor

cschindlbeck commented May 24, 2018

So today i tested my setup again as in #123 where i got the same error with moveit from source and (hopefully) the new OMPL (i used the default time parameterization)

So regarding your comment @RhysMcK

i fear you might once again encounter this error when you speed things back up to "normal" speed.

Now i am able to execute motions with normal speed without getting an error. However, i still hear "knocking" during the motions which i believe is likely caused by "jerky" motions as you already showed in the velocity profiles...so i can confirm that there is still an issue

@RhysMcK
Copy link
Author

RhysMcK commented May 24, 2018

@cschindlbeck i just did a quick-and-dirty print out of the position commands i was sending from within /kuka_hardware_interface.cpp and then a python script to collect the data and plot it. The velocity profile is just the differences between the last two sent position commands.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented May 25, 2018

@RhysMcK wrote:

i just did a quick-and-dirty print out of the position commands

just a note: both printf(..) as well as std stream output are two of the worst cases when it comes to predictive scheduling of tasks. In a real-time system, they are absolutely off-limits. On a soft real-time system, they'll probably influence scheduling in a negative way.

If you have the option, and want to implement this nicely, use an internal buffer that stores n last samples, and use a ROS service (or topic) to retrieve the contents of that buffer after the trajectory has been executed.

That way we avoid any kind of timing influence that may be caused by console IO.

@hartmanndennis
Copy link
Contributor

The velocity profile is just the differences between the last two sent position commands.

Without dividing by duration? The spikes in the TOTG graph always come in pairs with a point too fast, followed by a point too slow. This would happen if the first call to JTC is late and the second therefore has a smaller duration.

@gavanderhoorn Related: Should the call to controller manager update be changed to use a fixed duration and increment the time with fixed 4ms/12ms, since the robot expects points interpolated at this rate? Currently, if scheduling is a few ms late, JTC will return a point too far ahead, which could introduce micro-stutter even if the packet reaches the robot in time.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented May 29, 2018

@hartmanndennis wrote:

Related: Should the call to controller manager update be changed to use a fixed duration and increment the time with fixed 4ms/12ms, since the robot expects points interpolated at this rate? Currently, if scheduling is a few ms late, JTC will return a point too far ahead, which could introduce micro-stutter even if the packet reaches the robot in time.

This could be something to try. However, I think I can imagine a situation where it is possible for a 'desync' to build up between real-time and trajectory execution.

Implementation-wise I would still use the pkts from the controller to dictate control flow in the node, but then instead of using real time assume a fixed delta.

Is this something you could test?


Edit: it would be nice if we could use IPOC-deltas for the dt, instead of having to configure the dt somewhere.

@hartmanndennis
Copy link
Contributor

I'll make a PR soon. I already tested a bit and updated #95 with velocity graphs.
I'm still unsure, if only the duration parameter should be set constant or if the time parameter should be increased by the constant IPOC. I don't know about possible consequences for the controllers, if the clocks do diverge.

@gavanderhoorn
Copy link
Member

If possible, I would vote for using IPOC_(t_n+1) - IPOC_(t_n). That would seem to keep things in sync with the controller.

@hartmanndennis
Copy link
Contributor

That should work fine, but doesn't address the time parameter in the update() call, which expects absolute time. I don't know if ros_control has a defined standard usage for absolute time and duration. Does time_now - time_last need to be equal to duration? If so, why have duration at all...

@gavanderhoorn
Copy link
Member

gavanderhoorn commented May 30, 2018

In general it would be a problem, but in the case of kuka_rsi_hw_interface we're not using time. period (not duration) is calculated using std::chrono::steady_clock, which doesn't necessarily sync with walltime anyway.

The only really important thing is that any msgs broadcast with Headers in them use something that is globally comparable (ie: to other ros::Time instances fi), and not some logical clock. So we probably can't use IPOC for everything, but we don't have to: we're only interested in making sure position control is synced with the controller's internal clock.

Controller manager passes the time on to any loaded controllers (such as joint_state_controller) to publish msgs with.

If I understand everything correctly.

@cschindlbeck
Copy link
Contributor

Would it make sense to append a low-pass filter node to the JTC as a temporary workaround until the real culprit has been found?
I image that these jerky movements cannot be too healthy for the robotic systems

@gavanderhoorn
Copy link
Member

The JTC is not under our control, as it's a component in ros_controllers. You could add something there, but that would require a source build. If you're comfortable with that, then that could be an option.

Note that I believe that what @hartmanndennis suggests in his comments would probably be more straightforward.

@gavanderhoorn
Copy link
Member

Additionally: I'd be surprised if it's actually a problem in the JTC. I suspect that it's actually jitter on the scheduling of the node, leading to different dts being used.

@gavanderhoorn
Copy link
Member

We've recently run into this (again) here at TUD as well (@rtonnaer).

A different axis though (A1), but that is probably inconsequential.

Right now I'm plotting the values sent to the controller from a wireshark capture. Initial results don't look too bad (not very nice either, but nothing like the spikes that @RhysMcK shows in his plots). Unfortunately the way I process the data right now is synced with IPOC, which makes detecting late packets difficult / impossible. I'll have to process the raw pcap itself to check walltime progress to match it up with IPOC.

This is using #132 btw.

@gavanderhoorn
Copy link
Member

Has anyone of you (@RhysMcK, @cschindlbeck, @destogl) ever used the RSI filtering components?

@RhysMcK
Copy link
Author

RhysMcK commented Jun 13, 2018

No i haven't @gavanderhoorn . There doesn't seem to be too much information out there about how to configure them aswell. I was however recommended to try implementing a RSI low-pass filter when i first came across this issue from the folks over at 'robot-forum' as i previously mentioned (https://www.robot-forum.com/robotforum/kuka-robot-forum/command-gear-torque-a23-with-rsi-control/ ) .

Unfortunately, my time is being drained in other areas at work at the moment so haven't had a chance to continue this investigation.

@BrettHemes
Copy link
Member

Out of curiosity, do these problems also exist in Melodic?

@jbeck28
Copy link

jbeck28 commented May 15, 2024

I know this is very old, but I thought I'd share. The issue persists in the ROS2 version, what you're seeing is likely the result of latency spikes. Since the robot is position controlled - this results in sudden zero-velocity segments and tons of jerk. See this issue.

I haven't tested the fix as extensively as I'd like to, but certainly saw better performance from merging the pugixml branch into my fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants