Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RID_Owner lock-free for fetching. #86333

Closed
wants to merge 1 commit into from

Conversation

reduz
Copy link
Member

@reduz reduz commented Dec 19, 2023

This PR makes RID_Owner lock free for fetching values, this should give a very significant peformance boost where used.

Some considerations:

  • A maximum number of elements to alocate must be given (by default 200k).
  • Merged the validators and the data into a single struct, to make it more cache access friendly.
  • Uses a mutex for thread safety since nowadays spinlocks are discouraged.
  • Access to the RID structure is still safe given they are independent from addition/removals.
  • RID access was never really thread-safe in the sense that the contents of the data are not protected anyway. Each server needs to implement locking as it sees fit.

@reduz reduz requested a review from a team as a code owner December 19, 2023 16:37
@Calinou Calinou added this to the 4.3 milestone Dec 19, 2023
@reduz reduz force-pushed the lock-free-rid branch 2 times, most recently from 6b35284 to 89db4ba Compare December 19, 2023 16:51
This PR makes RID_Owner lock free for fetching values, this should give a very
significant peformance boost where used.

Some considerations:

* A maximum number of elements to alocate must be given (by default 200k).
* Access to the RID structure is still safe given they are independent from addition/removals.
* RID access was never really thread-safe in the sense that the contents of the data are not protected anyway. Each server needs to implement locking as it sees fit.
Copy link
Member

@RandomShaper RandomShaper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the thread-safe version has pre-allocate because otherwise memrealloc() can change the base pointer of the data. I'm wondering if that could be solved by using a PagedAllocator (UPDATE: or PagedArray, maybe is what I mean). Maybe the downside is that element access needs a bit of extra arithmetic and, more prominently, indirection to reach the item on the right page. Can you confirm?

}

if (alloc_count == max_alloc) {
//allocate a new chunk
uint32_t chunk_count = alloc_count == 0 ? 0 : (max_alloc / elements_in_chunk);
if (THREAD_SAFE && chunk_count == chunk_limit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (THREAD_SAFE && chunk_count == chunk_limit) {
if (unlikely(THREAD_SAFE && chunk_count == chunk_limit)) {

@RandomShaper
Copy link
Member

RandomShaper commented Dec 21, 2023

Aside, given there's no need for thread safety on the data itself (requires external sync anyway), I'm wondering if we could go fully lock-free, this way:

  • If thread-safe, validator becomes an std::atomic<uint32_t>, with load-acquire and store-release used on it.
  • If not thread-safe, it keeps being a raw uint32_t.

PS: For the records, there's an idea I don't want to forget and so I'm writing it here: in case we still want locking, the SpinLock may still be advantageous, as it's sync variable could be used even for non-locky operations; e.g., to acquire-load on them. To avoid some of the downsides of it, we'd want to polish #85167. But, again, this is not relevant at the current status of the discussion.

@reduz
Copy link
Member Author

reduz commented Dec 21, 2023

@RandomShaper I am wondering if making the validator an atomic really changes anything, my fear is that load-acquire still forces the other CPUs to finalize other memory writes.

@RandomShaper
Copy link
Member

Acquire-release atomics would at least be superior to either Mutex or SpinLock. The problem is thus with the functions that are now lock-less in this PR even in the thread-safe version. If we are fine with readers potentially seeing old data, those reads could be relaxed, but they have to be atomic in any case (in most or all relevant architectures that will boil down to plain reads, but the code would be more future-proof that way).

@reduz
Copy link
Member Author

reduz commented Dec 21, 2023

@RandomShaper I think in theory it should be fine if they see old data in this case, the data will not be accessible anyway after the mutex ends the lock, and will be updated when removed after the mutex also ends the lock. In the meantime, nothing should bother the data access so I think it should be ok. The data itself is not thread safe anyway, even now, so its not related to the actual RID access per se.

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally (rebased on top of master 10e1114 + #86333), it works as expected on Linux + NVIDIA.

On https://github.com/godotengine/tps-demo, it loads a full second faster than on master with neither PR applied.

}
}

void set_description(const char *p_descrption) {
description = p_descrption;
}

RID_Alloc(uint32_t p_target_chunk_byte_size = 65536) {
RID_Alloc(uint32_t p_target_chunk_byte_size = 65536, uint32_t p_maximum_amount_of_elements = 200000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, shouldn't the default value for p_maximum_amount_of_elements be a multiple of 65536?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah thats a good point

@akien-mga akien-mga changed the title Make RID_Owner lock-free for fetching. Make RID_Owner lock-free for fetching. Feb 3, 2024
@clayjohn
Copy link
Member

clayjohn commented Feb 16, 2024

For context, we need to merge this before merging #90400, but we are in no rush to merge #90400 for now, so no pressure

@akien-mga
Copy link
Member

Superseded by #97465.

@akien-mga akien-mga closed this Sep 25, 2024
@akien-mga akien-mga removed this from the 4.4 milestone Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants