Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use statically-typed views for better performance #856

Merged
merged 30 commits into from
Aug 13, 2021

Conversation

adlarkin
Copy link
Contributor

@adlarkin adlarkin commented Jun 10, 2021

Signed-off-by: Ashton Larkin ashton@openrobotics.org

🎉 New feature

Closes #711

Summary

In an ongoing attempt to improve runtime performance, #711 points out that the runtime performance of EntityComponentManager::Each gets worse as more components are used. I've changed how views are implemented to improve the runtime of EntityComponentManager::Each. The main change here is that we now have a statically-typed view, which is defined by the list of component types that are required by the view. By doing this, a view can store an entity and pointers to the required component types in a std::tuple, and then make use of std::apply when we want to apply an entity and all of its relevant components to a callback in EntityComponentManager::Each.

Using std::apply allows for better runtime performance because we can apply the entity and associated components to a callback "all at once". This is quicker than the previous view approach, which involved individually looking up each component for an entity (see #711 (comment)).

The main drawback with the approach of a statically-typed view is the added software complexity/maintenance to the system. For example, during testing, I found that tuple creation can be costly - so, in order to avoid re-creating tuples whenever a component was being added/removed, I added an internal ignore flag to components to eliminate tuple re-creation. This ignore component flag is set to true whenever a user requests to remove an entity that was previously created, and is set to false whenever a user requests to add an entity that was previously removed. The ignore flag is hidden from the user and is used by the ECM and its views to determine which entities and components should be used in a particular Each call.

Another thing to note is that the old ComponentStorage approach is no longer used. This new view approach does not require components of the same type to be packed sequentially in memory, and through various testing that I performed, I found that runtime performance with statically-typed views without components stored sequentially in memory is still noticeably better than the runtime performance of the old views with components stored sequentially in memory. So, hopefully this makes at least the ComponentStorage part of our system simpler to maintain 🙂

A side effect of removing the old ComponentStorage approach is that we also no longer need ComponentId, which means that we also no longer need ComponentKey. I tried to deprecate methods using ComponentKey wherever possible in order to follow the tick-tock model, but there were several methods that had to be deleted completely and replaced with something else (i.e., methods that returned ComponentKey or had ComponentKey as the only method parameter, which meant that there was no longer enough information provided to achieve the desired functionality of the method).

Test it

If you'd like to compare the performance of running Each with the approach proposed in this PR to the existing view approach, you can run the each benchmark. I have already done this on my system, and have included the results below (I ran a comparison against 8f5103b):

Each benchmark test

As seen from the results above, all of the EachCache tests are significantly faster with the statically-typed view approach - in fact, if we look at the very last test in that file (Each10ComponentCache/1000 - so, 1000 entities with 10 components each), we go from 150ms to 2ms 🤯 It's also worth noting that we don't see a decrease in EachNoCache performance either, which makes sense since views are not used for EachNoCache.


Another way to test the changes here is to run a world with 3k shapes. Here are the RTF numbers I'm seeing from testing 3k simple shapes, using both GUI and headless simulation. I used TPE for the physics engine. I also echoed the /world/shapes/dynamic_pose/info topic to see how RTF was impacted using these changes (see #743).

original ECM (main at ca10c77):

  • static, no gui: 42%
    • echoing the /world/shapes/dynamic_pose/info topic: 8.3%
  • static, gui: bounces around between 11-15%
    • echoing the /world/shapes/dynamic_pose/info topic: 4%
  • non-static, no gui: 8.5%
    • echoing the /world/shapes/dynamic_pose/info topic: 3.5%
  • non-static, gui: 4%
    • echoing the /world/shapes/dynamic_pose/info topic: 2.3%

new ECM (this branch):

  • static, no gui: 50%
    • echoing the /world/shapes/dynamic_pose/info topic: 35% (this is a huge increase compared to 8.3% 🤯)
  • static, gui: bounces around 30-35%
    • echoing the /world/shapes/dynamic_pose/info topic: 23% (again, a huge increase compared to 4% 🤯)
  • non-static, no gui: 8.6%
    • echoing the /world/shapes/dynamic_pose/info topic: 5%
  • non-static, gui: 6%
    • echoing the /world/shapes/dynamic_pose/info topic: 3.75%

As the numbers show, there's a performance improvement for almost every single test case (there's not much of an improvement for non-static, no GUI). For use cases where plugins are used that make calls to Each with a large number of components, I suspect that there will be even greater performance improvements.

It was also pointed out in #856 (comment) that GUI startup time for 3k shapes with the changes in this PR is only a few seconds (it used to take up to a few minutes). Perhaps this is because of some each calls that are used when initializing the GUI?

Another thing to note is that when using the changes in this PR to run 3k shape simulations with a GUI, loading time is a lot faster. In the tests I ran, I found that these new changes allowed the GUI to load and run in a few seconds, while the old ECM/View implementation took a few minutes to load (thanks @iche033 for making this discovery!).


One other scenario that could be worth testing is a benchmark of the time it takes to add/remove components frequently. I did some informal testing locally with the new approach and found that removing components with the statically-typed views and an internal component ignore flag is faster than the current ECM/view approach, but perhaps it would be worth adding a benchmark test for this. This PR is already very large, so I decided to hold off on adding a benchmark test for now, but if anyone would be interested in me adding this in, let me know - I can also make another PR that adds this benchmark test if that is more preferable 🙂

Takeaways/Next steps

While it's clear that this PR has improved the runtime performance of EntityComponentManager::Each, there's not much of an improvement in RTF for the headless tests. While #793 should help improve the RTF for running the GUI, we will need to figure out why we still aren't seeing higher RTF for headless simulation (theoretically, headless RTF for 3k static shapes should be 100%).

I would also be interested in figuring out how to generate better testing scenarios/evaluation metrics. Perhaps 3k simple shapes isn't a great way to benchmark performance - we aren't running any plugins other than the physics system, and aren't doing things like adding/removing components. Maybe an external user will try these changes in a context that differs from the 3k simple shapes example and find a nice performance improvement! Considering that most users use various plugins and may or may not add/remove components frequently, I think it'd be worth taking some time to figure out how to benchmark use cases that are frequent among Ignition users. That way, we can get a better feel for whether our changes are being noticed by the community or not.

Also, it would be nice to add some more features/functionality to the EntityComponentManager that make use of the new statically-typed views. For example, perhaps we can use iterators (as done in EnTT) for use cases where a lot of components are used as "filters", and we only need specific components of entities that pass all component filter checks. So, if a user wanted all entities that are static models, but only need the pose of these models, then maybe they could do something like this:

auto entityGroup = ecm.FilterEntities<components::Static, components::Model, components::Pose>();

for (auto entity : entityGroup)
{
  auto pose = entity.Component<components::Pose>();

  // do something with pose here
}

Related to the idea of providing functionality like iterators as described above, we will eventually need to address #628 and #805.

Checklist

  • Signed all commits for DCO
  • Added tests
  • Added example and/or tutorial
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge

Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
@github-actions github-actions bot added the 🏯 fortress Ignition Fortress label Jun 10, 2021
Copy link
Contributor Author

@adlarkin adlarkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few questions/notes for my reviewers.

test/integration/components.cc Outdated Show resolved Hide resolved
include/ignition/gazebo/Types.hh Outdated Show resolved Hide resolved
include/ignition/gazebo/Types.hh Outdated Show resolved Hide resolved
src/EntityComponentManager_TEST.cc Show resolved Hide resolved
@iche033
Copy link
Contributor

iche033 commented Jun 10, 2021

Do you notice that when launching shapes_population.sdf with GUI, the scene is now loaded faster? It used to take 2 mins on my machine and now it's only just a few seconds.

I am testing with 2 workspaces, one on this branch and the other on main, and both built from source. My RTFs from running the shapes population world with the tpe plugin:

new ECM (this branch):

  • static, no gui: 35%
  • static, gui: 16%
  • non-static, no gui: 10%
  • non-static, gui: 5%

original ECM (main)

  • static, no gui: 26%
  • static, gui: 6%
  • non-static, no gui: 9%
  • non-static, gui: 3.5%

So looks like it made a difference on my machine. Could you try comparing against fortress built from source and see if you also see these improvements?

@adlarkin
Copy link
Contributor Author

Do you notice that when launching shapes_population.sdf with GUI, the scene is now loaded faster? It used to take 2 mins on my machine and now it's only just a few seconds.

I never ran this with the GUI the first time around, but now that you have made the point, I went ahead and tested it myself. Yes, the scene loads a lot faster for me now with these changes. I'm also seeing loading times decrease from a few minutes to a few seconds. Thanks for pointing this out!

Could you try comparing against fortress built from source and see if you also see these improvements?

I went ahead an did a comparison with source builds and have updated the PR description with my results. I do notice some improvements now, especially when echoing the /world/shapes/dynamic_pose/info topic 😎 (see #743)

Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor optimization comments in code.

I'm also randomly testing examples worlds. So far I ran into one crash which I'm not able to reproduce on the main branch. To reproduce:

ign gazebo -v 4 -r --levels levels.sdf

Right click and remove the red vehicle -> segfault

include/ignition/gazebo/detail/BaseView.hh Outdated Show resolved Hide resolved
include/ignition/gazebo/detail/BaseView.hh Outdated Show resolved Hide resolved
include/ignition/gazebo/components/Factory.hh Show resolved Hide resolved
include/ignition/gazebo/detail/EntityComponentManager.hh Outdated Show resolved Hide resolved
include/ignition/gazebo/detail/EntityComponentManager.hh Outdated Show resolved Hide resolved
src/EntityComponentManager.cc Outdated Show resolved Hide resolved
src/EntityComponentManager.cc Outdated Show resolved Hide resolved
src/EntityComponentManager.cc Outdated Show resolved Hide resolved
src/EntityComponentManager.cc Outdated Show resolved Hide resolved
src/EntityComponentManager.cc Outdated Show resolved Hide resolved
Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
@adlarkin
Copy link
Contributor Author

adlarkin commented Jun 18, 2021

I've addressed all review feedback, except for updating the migration guide with things that had to be deleted or deprecated (I'll do this once we know no other changes have to be made).

So far I ran into one crash which I'm not able to reproduce on the main branch. To reproduce:
ign gazebo -v 4 -r --levels levels.sdf
Right click and remove the red vehicle -> segfault

I tested this locally as well, and also noticed a segfault. I will debug this and update the PR once I've found a fix.

@adlarkin adlarkin self-assigned this Jun 21, 2021
Copy link
Contributor

@chapulina chapulina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a first pass. There's a lot of changes, so please allow me a few passes until I get the hang of it.

I have some suggestions to allow us to tick-tock most of the API and ease migration in #873.

Meanwhile, can you add some unit tests for the new classes / files? i.e. BaseView, ComponentStorage

include/ignition/gazebo/components/Component.hh Outdated Show resolved Hide resolved
include/ignition/gazebo/components/Component.hh Outdated Show resolved Hide resolved
include/ignition/gazebo/Types.hh Outdated Show resolved Hide resolved
src/EntityComponentManager_TEST.cc Outdated Show resolved Hide resolved
src/EntityComponentManager_TEST.cc Show resolved Hide resolved
include/ignition/gazebo/detail/BaseView.hh Show resolved Hide resolved
include/ignition/gazebo/detail/BaseView.hh Outdated Show resolved Hide resolved
include/ignition/gazebo/detail/BaseView.hh Show resolved Hide resolved
include/ignition/gazebo/detail/ComponentStorage.hh Outdated Show resolved Hide resolved
src/EntityComponentManager.cc Outdated Show resolved Hide resolved
adlarkin and others added 6 commits June 28, 2021 15:49
Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
…t test

Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
include/ignition/gazebo/detail/BaseView.hh Show resolved Hide resolved
include/ignition/gazebo/detail/ComponentStorage.hh Outdated Show resolved Hide resolved
src/EntityComponentManager.cc Outdated Show resolved Hide resolved
Signed-off-by: Ashton Larkin <ashton@openrobotics.org>
@adlarkin
Copy link
Contributor Author

Let's run another round of performance tests to sanity check that we didn't introduce any regressions

I ran performance tests again and did not see any noticeable changes in the numbers. You can see the results in the PR description.

figure out what's happening with CI.

I think at this point, there are 2 lingering CI issues:

  1. Test failures on ubuntu CI that cannot be reproduced locally - see Use statically-typed views for better performance #856 (comment)
  2. Compilation failure on windows related to the new templated view class - see Use statically-typed views for better performance #856 (comment)

Signed-off-by: Louise Poubel <louise@openrobotics.org>
Signed-off-by: Louise Poubel <louise@openrobotics.org>
Signed-off-by: Louise Poubel <louise@openrobotics.org>
@chapulina
Copy link
Contributor

Compilation failure on windows

I'm blindly trying @j-rivero's approach from gazebosim/gz-gui@5f7f05c in 66bb2a2

It's worth noting that it fails when compiling the ignition-gazebo6-gui component, not the core library.

Test failures on ubuntu CI

I think it may be due to root permissions on the Jenkins node?

Nope, the same happens on GitHub actions, so it's definitely an issue with the PR. Very strange that it fails to create the server.config file exactly for those tests, but succeeds on other tests. I thought it may be some interference between the tests, but I can't reproduce the failure even running all colcon tests locally.

@adlarkin
Copy link
Contributor Author

adlarkin commented Aug 12, 2021

Nope, the same happens on GitHub actions, so it's definitely an issue with the PR

That's strange - I mentioned in a previous comment that tests started failing in https://build.osrfoundation.org/job/ignition_gazebo-ci-pr_any-ubuntu_auto-amd64/6727/, which was a build for #943. But, I don't see how changes in #943 would cause a break.

Another thing to note is that these tests are not failing on MacOS CI 🤔

…tuff

Signed-off-by: Louise Poubel <louise@openrobotics.org>
@chapulina
Copy link
Contributor

I have a promising fix for Server_TEST in d9a49e6, I was able to reproduce the failure locally when the ~/.ignition/gazebo directory didn't exist. It still doesn't make sense to me why this manifested in this PR though 🤷‍♀️

@adlarkin
Copy link
Contributor Author

I was able to reproduce the failure locally when the ~/.ignition/gazebo directory didn't exist.

Ohhh you know what, this has actually happened to me before. I just forgot about it and always ran ign gazebo once (to populate ~/.ignition/gazebo) before running tests to avoid this error locally. So, if anything, it sounds like this PR exposed a bug instead of creating a new one 😉 Hopefully your fix works! 🤞

@codecov
Copy link

codecov bot commented Aug 12, 2021

Codecov Report

Merging #856 (2dadec6) into main (e3ee0af) will increase coverage by 0.38%.
The diff coverage is 76.83%.

❗ Current head 2dadec6 differs from pull request most recent head abf9d23. Consider uploading reports for the commit abf9d23 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #856      +/-   ##
==========================================
+ Coverage   64.66%   65.04%   +0.38%     
==========================================
  Files         242      245       +3     
  Lines       19014    19413     +399     
==========================================
+ Hits        12296    12628     +332     
- Misses       6718     6785      +67     
Impacted Files Coverage Δ
include/ignition/gazebo/EntityComponentManager.hh 100.00% <ø> (ø)
include/ignition/gazebo/Link.hh 100.00% <ø> (ø)
include/ignition/gazebo/Server.hh 100.00% <ø> (ø)
include/ignition/gazebo/rendering/RenderUtil.hh 100.00% <ø> (ø)
src/gui/Gui.cc 65.35% <ø> (+0.23%) ⬆️
src/gui/GuiRunner.cc 85.91% <0.00%> (-0.34%) ⬇️
src/gui/plugins/scene3d/Scene3D.hh 50.00% <ø> (ø)
src/systems/lift_drag/LiftDrag.hh 100.00% <ø> (ø)
src/systems/log/LogRecord.cc 80.56% <ø> (-0.31%) ⬇️
src/systems/velocity_control/VelocityControl.hh 100.00% <ø> (ø)
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e040ec2...abf9d23. Read the comment docs.

Signed-off-by: Jose Luis Rivero <jrivero@osrfoundation.org>
Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this branch with various worlds, spawning and deleting objects, distributed sim, custom plugins and it seems to be working quite well.

the latest ubuntu build failed due to network issue. I just queued another one.

@chapulina
Copy link
Contributor

That's a lot of failing tests on Ubuntu. The sensors ones should be fixed by #617 , but that's only 9 tests and there are 23 failing here

Signed-off-by: Louise Poubel <louise@openrobotics.org>
@chapulina
Copy link
Contributor

there are 23 failing here

I'm seeing 23 failing tests on Ubuntu Jenkins for multiple PRs. I believe this is the fallout from gazebo-tooling/release-tools#494.

Let's rely on the other platforms to merge this PR. GitHub actions was happy, as well as homebrew and Windows. I've fixed conflicts and will merge if CI comes back happy for those platforms again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏯 fortress Ignition Fortress performance Runtime performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants