Optimization and cleanup for bit arrays. #111

tommyettinger · 2023-08-09T02:22:56Z

This is organized logically by commit, and each commit has a (sometimes long-winded) comment. Long story short, addRemove() in the benchmarks goes from 1327 ops/second to 2161 ops/second. The number seems large because I tested with 1000 entities (instead of 10,000) and 5 world updates (instead of 1000) so that I could get meaningful results with short-ish iteration times. That mattered because the benchmark seems to have problems running out of heap if it spends too long on one iteration, though I don't think I encountered this with Fleks -- it was an issue with the Artemis benchmark.

There's lots of bitwise magic here. I'm more than happy to explain any part that doesn't make sense. Bitwise operations are usually quite fast compared to anything that deals with objects, so I'm always happy to see them being put to productive use. The main thing I tried to do was to strip down any of the places where indices 0-63 were looped over in each Long, since that takes more time than necessary in a sparse Long (one with few bits set). There's a new clearEnsuringCapacity() method in MutableEntityBag; it is functionally just like:

clear()
ensureCapacity(capacity)

But avoids copying the empty array without the need to. This is something that some of libGDX's collections also have.

Let me know what I can do to help!

Before, if you called `bag.ensureCapacity(bag.capacity())`, it would resize and create a new backing `values` Array every time, even though it would seem like the current capacity has already been ensured. There's an expansion to one test that catches this behavior in the old code, but doesn't find it in the PRed code.

These range from things that might not matter at all (because the compiler uses them behind-the-scenes), to taking advantage of non-obvious but specified behavior of bitwise operators. The former includes using `ushr 6` instead of `/ 64`, which does have an advantage in that negative bit indices will be treated as unsigned int indices, which still fit inside a large-ish `Array` (it needs 4GB of RAM to store 2 to the 26 `Long` items, though). The latter includes removing manual modulus on shift amounts, since according to the Kotlin docs on shl and ushr, among others, shift amounts applied to `Long`s are implicitly masked to only use the bottom 6 bits (when applied to `Int`s, they use 5 bits).

The `countLeadingZeroBits()` and `countOneBits()` functions each are marked as `@IntrinsicCandidate` on the JVM, because they compile down to a much faster-than-normal piece of native code (usually one instruction, with a name like CLZ or POPCNT). Using intrinsics wherever possible in bit-twiddling code can yield really serious performance gains if the code before used many instructions and it can be replaced by one. I believe `countOneBits()` had its performance improve on a very recent Java version, but before that some code needed to avoid it because it didn't play nicely with some loop optimizations (though in this case, if you wanted to avoid it, there are better options than looping over 64 bits to count the 1s).

This resize code could be wrong for this application, but I'm just normally used to seeing sets and arrays double in capacity when they run out, to avoid frequently creating a new backing array that's just a hair larger.

These are... pretty crazy pieces of code. They make heavy use of a pattern where they look into an item from the values (just a Long, so this is by value), find the highest bit index, do something with it, and then remove that highest bit from the Long value we have locally. Doing this means that if only one bit is set in the Long we look at, we only have to iterate once through that Long. Two bits set means two iterations, and so on.

src/commonMain/kotlin/com/github/quillraven/fleks/collection/bitArray.kt

Quillraven · 2023-08-09T06:28:28Z

Very cool stuff, thank you! I thought I removed the redundant % 64 operations in BitArray but seems like I forgot to push at some point and it got lost. You fixed that now as well :)

Just fyi, I also get the heap memory issue in the Artemis benchmark from time to time. Normally a simple restart of the benchmark solves it.

Referring to "twice as many Long items as before" wasn't a reference to earlier code, but the changing state of bits -- it's being resized here.

This seems to make the tests pass again, but we'll see.

tommyettinger added 5 commits August 8, 2023 18:25

Strange magic with set().

68701a0

This resize code could be wrong for this application, but I'm just normally used to seeing sets and arrays double in capacity when they run out, to avoid frequently creating a new backing array that's just a hair larger.

Quillraven reviewed Aug 9, 2023

View reviewed changes

src/commonMain/kotlin/com/github/quillraven/fleks/collection/bitArray.kt Outdated Show resolved Hide resolved

Quillraven added this to the 2.5 milestone Aug 9, 2023

tommyettinger added 2 commits August 9, 2023 00:43

Clear up the wording on the bits resizing comment.

ed9d78a

Referring to "twice as many Long items as before" wasn't a reference to earlier code, but the changing state of bits -- it's being resized here.

Fix a bug where empty bags still need clearing.

7ce439d

This seems to make the tests pass again, but we'll see.

Quillraven merged commit c73d351 into Quillraven:master Aug 9, 2023
4 checks passed

Quillraven mentioned this pull request Oct 17, 2023

version 2.5 #121

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization and cleanup for bit arrays. #111

Optimization and cleanup for bit arrays. #111

tommyettinger commented Aug 9, 2023

Quillraven commented Aug 9, 2023

Optimization and cleanup for bit arrays. #111

Optimization and cleanup for bit arrays. #111

Conversation

tommyettinger commented Aug 9, 2023

Quillraven commented Aug 9, 2023