Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization and cleanup for bit arrays. #111

Merged
merged 7 commits into from
Aug 9, 2023

Conversation

tommyettinger
Copy link
Contributor

This is organized logically by commit, and each commit has a (sometimes long-winded) comment. Long story short, addRemove() in the benchmarks goes from 1327 ops/second to 2161 ops/second. The number seems large because I tested with 1000 entities (instead of 10,000) and 5 world updates (instead of 1000) so that I could get meaningful results with short-ish iteration times. That mattered because the benchmark seems to have problems running out of heap if it spends too long on one iteration, though I don't think I encountered this with Fleks -- it was an issue with the Artemis benchmark.

There's lots of bitwise magic here. I'm more than happy to explain any part that doesn't make sense. Bitwise operations are usually quite fast compared to anything that deals with objects, so I'm always happy to see them being put to productive use. The main thing I tried to do was to strip down any of the places where indices 0-63 were looped over in each Long, since that takes more time than necessary in a sparse Long (one with few bits set). There's a new clearEnsuringCapacity() method in MutableEntityBag; it is functionally just like:

clear()
ensureCapacity(capacity)

But avoids copying the empty array without the need to. This is something that some of libGDX's collections also have.

Let me know what I can do to help!

Before, if you called `bag.ensureCapacity(bag.capacity())`, it would resize and create a new backing `values` Array every time, even though it would seem like the current capacity has already been ensured. There's an expansion to one test that catches this behavior in the old code, but doesn't find it in the PRed code.
These range from things that might not matter at all (because the compiler uses them behind-the-scenes), to taking advantage of non-obvious but specified behavior of bitwise operators. The former includes using `ushr 6` instead of `/ 64`, which does have an advantage in that negative bit indices will be treated as unsigned int indices, which still fit inside a large-ish `Array` (it needs 4GB of RAM to store 2 to the 26 `Long` items, though). The latter includes removing manual modulus on shift amounts, since according to the Kotlin docs on shl and ushr, among others, shift amounts applied to `Long`s are implicitly masked to only use the bottom 6 bits (when applied to `Int`s, they use 5 bits).
The `countLeadingZeroBits()` and `countOneBits()` functions each are marked as `@IntrinsicCandidate` on the JVM, because they compile down to a much faster-than-normal piece of native code (usually one instruction, with a name like CLZ or POPCNT). Using intrinsics wherever possible in bit-twiddling code can yield really serious performance gains if the code before used many instructions and it can be replaced by one. I believe `countOneBits()` had its performance improve on a very recent Java version, but before that some code needed to avoid it because it didn't play nicely with  some loop optimizations (though in this case, if you wanted to avoid it, there are better options than looping over 64 bits to count the 1s).
This resize code could be wrong for this application, but I'm just normally used to seeing sets and arrays double in capacity when they run out, to avoid frequently creating a new backing array that's just a hair larger.
These are... pretty crazy pieces of code. They make heavy use of a pattern where they look into an item from the values (just a Long, so this is by value), find the highest bit index, do something with it, and then remove that highest bit from the Long value we have locally. Doing this means that if only one bit is set in the Long we look at, we only have to iterate once through that Long. Two bits set means two iterations, and so on.
@Quillraven Quillraven added this to the 2.5 milestone Aug 9, 2023
@Quillraven
Copy link
Owner

Very cool stuff, thank you! I thought I removed the redundant % 64 operations in BitArray but seems like I forgot to push at some point and it got lost. You fixed that now as well :)

Just fyi, I also get the heap memory issue in the Artemis benchmark from time to time. Normally a simple restart of the benchmark solves it.

Referring to "twice as many Long items as before" wasn't a reference to earlier code, but the changing state of bits -- it's being resized here.
This seems to make the tests pass again, but we'll see.
@Quillraven Quillraven merged commit c73d351 into Quillraven:master Aug 9, 2023
4 checks passed
@Quillraven Quillraven mentioned this pull request Oct 17, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants