Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: DDS Multi Client Limited Bandwidth #20

Open
jondave opened this issue Feb 21, 2024 · 11 comments
Open

[BUG]: DDS Multi Client Limited Bandwidth #20

jondave opened this issue Feb 21, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@jondave
Copy link

jondave commented Feb 21, 2024

Description of the bug

When two or more clients are both using rviz using zenoh bridge then topic messages drop.

Change from <ParticipantIndex>auto</ParticipantIndex> to <ParticipantIndex>120</ParticipantIndex>.
https://github.com/LCAS/limo_ros2/blob/humble/.devcontainer/setup-router.sh#L28C27-L28C31

@cooperj, @GPrathap

Steps To Reproduce

Multiple students connecting to same robot.

Additional Information

No response

@jondave jondave added the bug Something isn't working label Feb 21, 2024
@marc-hanheide
Copy link
Member

I'm not sure this is the core of the problem.

We run Zenoh in peer to peer mode. To have multiple clients we'd have to run it with a Zenoh server, I think.
Good to know but not a bug.

But please feel free to open a PR with the suggested change if it helped.

@marc-hanheide
Copy link
Member

I need to ask what 120 achieves here. Documentation at https://github.com/eclipse-cyclonedds/cyclonedds/blob/master/docs%2Fmanual%2Foptions.md#cycloneddsdomaindiscoverymaxautoparticipantindex

suggests this is not a valid option. How did you come across this solution?

@jondave
Copy link
Author

jondave commented Feb 21, 2024

@GPrathap said he uses this solution in another ROS2 project.

@marc-hanheide
Copy link
Member

interesting 🤔 @GPrathap want to comment?

@GPrathap
Copy link

Hi @marc-hanheide, I am not entirely sure this resolves the issue, however, better to try and see. When I run the Rviz some of the messages' frequencies are dropping. After setting this value, it was fixed.

Also, https://docs.ros.org/en/galactic/How-To-Guides/DDS-tuning.html net.ipv4.ipfrag_time asked to reduce, if you reduce some of the messages are not visible in the rviz. I keep it at default value, e.g., 30.

@marc-hanheide
Copy link
Member

yes, setting a fixed participant ID can theoretically help with performance, but 120 seems to be the first illegal value. So not sure why that's chosen.

@marc-hanheide
Copy link
Member

I don't think we need to tune the kernel network params here as we are not using DDS over the network really. DDS is only on the local loopback device, as only zenoh is used externally?

@GPrathap
Copy link

Even in a single machine, you have to set sudo sysctl -w net.core.rmem_max=2147483647 , have a look here SteveMacenski/spatio_temporal_voxel_layer#257

@marc-hanheide
Copy link
Member

I toyed around with the most simple setting CYCLONEDDS settings in LCAS/teaching#44 It appears to make it much more performant for me, but only started investigating

@marc-hanheide
Copy link
Member

I think LCAS/teaching#49 is actually the correct way to address this. We actually want multicast inside the container. It will be contained to the container anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants