-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the memory behaviour of Vector2 #115
Conversation
I compared to caa4ef0: and we seem to be getting a solid 30-50% faster on average, which is a surprisingly-large improvement. The outliers are likely glitches, as
|
This is a cool bit of benchmarking! I like the idea of using slots. What happens if/when we make 3d vectors? (Primarily, there's some math I don't fully understand myself that is apparently easier with the single extra degree? You may know the state of that better than I do.) |
Looks like it isn't possible to make the GC not track |
@pathunstrom As far as I understand (I'm no Python expert, though I checked the docs and checked the behaviour in the REPL), inheritance works fine with That said, I don't think a potential
I think we should make a separate issue if we want to keep discussing 3D vectors :) |
OK, this should be good to go (though sadly without GC optimisations) |
The performance metrics are a lot more dramatic than I was expecting. This will also need a matching change in |
When used with a version of ppb-vector that defines __slots__, this should dramatically reduce memory usage. When used with an earlier version of ppb-vector __dict__ is accessible, making the __slots__ declaration a no-op. See ppb/ppb-vector#115
When used with a version of ppb-vector that defines `__slots__`, this dramatically reduces memory usage. When used with an earlier version of ppb-vector, `__dict__` is accessible, making the change a no-op. See ppb/ppb-vector#115
This makes instances more than 3× smaller (from 328 to 104 bytes, as reported by pympler), and faster to access. Comparison with master as of caa4ef0: $ python3 -m perf compare_to benchmark_*.json --table +-----------+------------------+------------------------------+ | Benchmark | benchmark_master | benchmark_slots | +===========+==================+==============================+ | __add__ | 4.83 us | 2.49 us: 1.94x faster (-48%) | +-----------+------------------+------------------------------+ | __eq__ | 914 ns | 495 ns: 1.85x faster (-46%) | +-----------+------------------+------------------------------+ | convert | 468 ns | 326 ns: 1.44x faster (-30%) | +-----------+------------------+------------------------------+ | normalize | 9.45 us | 5.81 us: 1.63x faster (-39%) | +-----------+------------------+------------------------------+ | length | 451 ns | 662 ns: 1.47x slower (+47%) | +-----------+------------------+------------------------------+ | rotate | 4.20 us | 6.56 us: 1.56x slower (+56%) | +-----------+------------------+------------------------------+ | scale_by | 2.11 us | 1.58 us: 1.33x faster (-25%) | +-----------+------------------+------------------------------+ | scale_to | 5.10 us | 7.08 us: 1.39x slower (+39%) | +-----------+------------------+------------------------------+ Not significant (7): __sub__; reflect; angle; dot; isclose; __neg__; truncate
tests/memory: Updated as Vector2 instances are now a tiny bit larger
@astronouth7303 Rebased to handle the merge conflict. |
__slots__
rather than an implicit__dict__
.Inform the GC thatVector2
instances need not be trackedRationale
We can expect users of
ppb-vector
to use very many instances, in part due to the immutable design (though Python's refcounting implementation should immediately collect unreachable vectors, this still creates memory churn), and in part because vectors are an essential datastructure in games development.Moreover, memory churn and GC interactions can be the source of pauses and jitter, which are a problem for games.
As such, it makes sense to try and minimize the per-instance cost of using
Vector2
, especially if it comes with no cost in terms of ergonomics, and negligible maintenance burden.Expected effects
The use of
__slots__
made each vector use over 3 times less memory.This was tested by hand (against the implementation in
master
) and there is also a test that comparesVector2
against a naive representation of vectors (not based ondataclasses
).Using
__slots__
should result in marginally-better performance.This can be tested with
tests/benchmark.py
but I cannot run it right now (my laptop is heavily loaded).Removing the need for GC tracking should result in shorter pauses.I plan on testing that with thehugs.py
example game.