Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many calls to object_get_instance_binding() #793

Closed
Ughuuu opened this issue Jul 5, 2024 · 8 comments
Closed

Too many calls to object_get_instance_binding() #793

Ughuuu opened this issue Jul 5, 2024 · 8 comments
Labels
c: ffi Low-level components and interaction with GDExtension API performance Performance problems and optimizations

Comments

@Ughuuu
Copy link
Contributor

Ughuuu commented Jul 5, 2024

In my performance investigation I see a lot of usage of:
PhysicsDirectBodyState2D -> get_transform, get_velocity ...

For these, the times are much higher than in the C++ equivalent. I even make the functions be no-op, but the overhead is still there.

What I see is Godot calls inside PhysicsDirectBodyState2D 5 times, with 5 different functions, and in 3D mode about 6 times. Old version of plugin was able to do 8800 circles, so that would be for 2D 8000 * 5 times per second.

The function right now looks like this:

#[godot_api]
impl IPhysicsDirectBodyState2DExtension for RapierDirectBodyState2D {
    fn get_transform(&self) -> Transform2D {
        Transform2D::IDENTITY
    }
}

I would like to have a more efficient way of doing this, if possible.

@Bromeon Bromeon added the performance Performance problems and optimizations label Jul 8, 2024
@lilizoey
Copy link
Member

lilizoey commented Jul 8, 2024

One way to significantly help this issue is with custom user-data wrappers. This would let the user pick a calling convention here that reduces overhead, potentially even makes it a no-op.

Making it possible to pick what receiver you want here would also help. Using Gd<Self> instead for instance should reduce overhead by avoiding bind/bind_mut. But im not entirely sure of a good way to do that, a couple of ways would be:

  1. wait for arbitrary self types, and be generic over Receiver (this will likely take a very long time to stabilize)
  2. add more traits for different receivers (adds a bunch of more traits)
  3. use proc-macro magic (wont work that well with the traits)
  4. use builder-api

In my opinion the builder api is probably the best solution out of these, but it is a lot of work. It might also add a little bit of overhead during start-up, but that's a one-time cost.

@Ughuuu
Copy link
Contributor Author

Ughuuu commented Jul 17, 2024

Is there one of these I could possibliy do in a hacky way on a fork of this repo? That you know of. I'm not sure many people would care that much about performance in this case, that's why I ask(in case this never gets to see the light of day)

@Bromeon
Copy link
Member

Bromeon commented Jul 17, 2024

Did you evaluate in this specific problem what causes the majority of calls in this particular problem? In your link you have lots of ranked lists, but it's not clear which exactly applies here.

Is it GdRef/GdMut guards? If yes, which operations specifically?

@Ughuuu
Copy link
Contributor Author

Ughuuu commented Jul 17, 2024

I'm not 100% sure I understand the question or know what GdRef/GdMut usage has to do with my problem, but basically godot is calling into the dll in PhysicsDirectBodyState2D::get_transform, PhysicsDirectBodyState2D::get_linear_velocity, PhysicsDirectBodyState2D::get_angular_velocity, PhysicsDirectBodyState2D::is_sleeping, PhysicsDirectBodyState2D::get_contacts_debug.

I implement these functions like this:

#[godot_api]
impl IPhysicsDirectBodyState2DExtension for RapierDirectBodyState2D {
    fn get_transform(&self) -> Transform2D {
        Transform2D::IDENTITY
    }
    fn get_linear_velocity(&self) -> Vector2 {
        Vector2::default()
    }
...
}

So, as a test, I am doing basically no-op to see what the problem is from. And there is still a very big performance hit. These functions are called for every active object. So in the case I have 8000 objects, the lib gets called from godot 8000 * 5 times (for 3d its more as for 3d there is also inertia_tensor and principal_axis functions that are called).

So the problem seems to be the linking of the &self and &mut_self` of this struct. I still would need a way to match this struct to something though, as internally I would get the object from a hashmap and then get the transform/velocity/other_states.

Even more, if I am looking at the percentage of the function using rust and the one using c++, the time it takes to call this function for the rust one is very big compared to the c++ one. Eg. without this and the bug about Array cloning(tested by disabling this flow completely, not calling some callbacks into godot) I am getting in 3D use case of rapier 8k-9k objects, for 3D getting 10k-11k.

@Bromeon
Copy link
Member

Bromeon commented Jul 17, 2024

I'm not 100% sure I understand the question or know what GdRef/GdMut usage has to do with my problem

The &self in your virtual method needs to come from somewhere. Behind the scenes, there is a Gd<RapierDirectBodyState2D>, on which bind() is called -- bind() returns a guard of type GdRef. Doing this performs a runtime borrow check to ensure there's no concurrent write active.

In your case, you wouldn't need this since you don't access self but instead return constants.


So if you run the code that calls these virtual functions through a profiler, what are the top contenders for the CPU time spent? To me it wasn't clear if this is listed in your analysis linked in the initial post, and if so, where exactly.

@Ughuuu
Copy link
Contributor Author

Ughuuu commented Jul 17, 2024

  1. I would still need to get the data from somewhere, but at least I would not do that bind thing. I am hoping to get the data in a faster way.
  2. The bottlenecks happen from:
  • The flush_queries function. This function is a function of physics_server_2d and is called from Godot. From this function I call callable.callv and then Godot calls back into those 5 functions I listed above 8000 times.

From investigation, at Main -> Flow1 -> Callable::callv (1.43s) is where the first callable happens, that then calls into godot and gets called back 5 times:
image

Then, at Main -> Flow1 -> Callable::callv (1.43s) --> 1. you can see how one of these callbacks looks like:
image

Both from this and other one, the most time spent is:

  • RawGd::bind: 26%
  • ClassName::from_ascii_cstr: 1%
  • single_threaded::InstanceStorage::get: 0.8% + 0.8%
  • HashMap::get: 0.8%

Note:

  • You will also see in the cpu usage the: core::fmt::write That is from the debug logging issue.

As can be seen, a big chunk of the code execution comes from RawGd::bind.

@Ughuuu
Copy link
Contributor Author

Ughuuu commented Jul 21, 2024

After a lot of investigations, it appears the RawGd::bind method calls internally get_instance_binding and that uses a lock. Instead we can cache that value.

Investigation of other cases (godot-cpp with c++, godot-rust with rust). Both are release builds.

5 seconds timeframe, flush_queries

Box2D: 1.12s (22%). - 7000 balls 1-3 fps

  • Callable::callv 14%
  • body_state_changed 10%
  • ..
  • get_linear_velocity: 0.6%
  • is_sleep: 0.3%
    ...

Interesting to see here that out of total of 22% from flush_queries, 10% is from body_state_changed, and the get_linear_velocity is at 0.6%.

Godot-Rust without this change: 2.66s (47%). - 7000 balls 4 fps

  • Callable:callv 28%
  • body_state_changed 23%
  • ..
  • get_linear_velocity: 4.4%
  • is_sleep: 1.4%
    ...

Godot-Rust with this change: 2.61s (47%). - 7000 balls 8-10 fps

  • Callable:callv 23.3%
  • body_state_changed 16%
  • ..
  • get_linear_velocity: 1.5%
  • is_sleep: 0.9%
    ...

Notice how the total time of the flush_queries function didn't really go down. However this might be because the fps affects this too. Since fps is down, the callv function itself isn't called that many times.

Will update with times from less balls so they all have 60fps, enough time to call the Callable::callv same amount of times.

3000 Balls with godot in release mode

Physics|Callable::callv|body_state_changed
-|-|-|-
Box2d|6.3%|5%
Rapier without change|11.4%|8.5%
Rapier with change|11.3%|8.1%

@Bromeon
Copy link
Member

Bromeon commented Aug 2, 2024

Closed by #831.

@Bromeon Bromeon closed this as completed Aug 2, 2024
@Bromeon Bromeon changed the title [Performance Issue] The godot ffi layer is a bottleneck for my use cases Too many calls to object_get_instance_binding() Aug 2, 2024
@Bromeon Bromeon added the c: ffi Low-level components and interaction with GDExtension API label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: ffi Low-level components and interaction with GDExtension API performance Performance problems and optimizations
Projects
None yet
Development

No branches or pull requests

3 participants