Optimize the memory behaviour of Vector2 #115

nbraud · 2019-03-06T12:07:35Z

Use pre-allocated __slots__ rather than an implicit __dict__.
~~Inform the GC that Vector2 instances need not be tracked~~

Rationale

We can expect users of ppb-vector to use very many instances, in part due to the immutable design (though Python's refcounting implementation should immediately collect unreachable vectors, this still creates memory churn), and in part because vectors are an essential datastructure in games development.

Moreover, memory churn and GC interactions can be the source of pauses and jitter, which are a problem for games.

As such, it makes sense to try and minimize the per-instance cost of using Vector2, especially if it comes with no cost in terms of ergonomics, and negligible maintenance burden.

Expected effects

The use of __slots__ made each vector use over 3 times less memory.
This was tested by hand (against the implementation in master) and there is also a test that compares Vector2 against a naive representation of vectors (not based on dataclasses).
Using __slots__ should result in marginally-better performance.
This can be tested with tests/benchmark.py but I cannot run it right now (my laptop is heavily loaded).
~~Removing the need for GC tracking should result in shorter pauses.~~
~~I plan on testing that with the hugs.py example game.~~

nbraud · 2019-03-06T21:08:13Z

I compared to caa4ef0: and we seem to be getting a solid 30-50% faster on average, which is a surprisingly-large improvement.

The outliers are likely glitches, as perf complained a lot about results not being statistically significant (standard deviation larger than the mean...).
That's likely due to my running those benchmarks on my overheating laptop, so It would be super helpful if one of you could reproduce the results on a more reasonable box.

$ python3 -m perf compare_to benchmark_*.json --table
+-----------+------------------+------------------------------+
| Benchmark | benchmark_master | benchmark_slots              |
+===========+==================+==============================+
| __add__   | 4.83 us          | 2.49 us: 1.94x faster (-48%) |
+-----------+------------------+------------------------------+
| __eq__    | 914 ns           | 495 ns: 1.85x faster (-46%)  |
+-----------+------------------+------------------------------+
| convert   | 468 ns           | 326 ns: 1.44x faster (-30%)  |
+-----------+------------------+------------------------------+
| normalize | 9.45 us          | 5.81 us: 1.63x faster (-39%) |
+-----------+------------------+------------------------------+
| length    | 451 ns           | 662 ns: 1.47x slower (+47%)  |
+-----------+------------------+------------------------------+
| rotate    | 4.20 us          | 6.56 us: 1.56x slower (+56%) |
+-----------+------------------+------------------------------+
| scale_by  | 2.11 us          | 1.58 us: 1.33x faster (-25%) |
+-----------+------------------+------------------------------+
| scale_to  | 5.10 us          | 7.08 us: 1.39x slower (+39%) |
+-----------+------------------+------------------------------+

Not significant (7): __sub__; reflect; angle; dot; isclose; __neg__; truncate

pathunstrom · 2019-03-06T21:18:24Z

This is a cool bit of benchmarking!

I like the idea of using slots. What happens if/when we make 3d vectors? (Primarily, there's some math I don't fully understand myself that is apparently easier with the single extra degree? You may know the state of that better than I do.)

nbraud · 2019-03-07T19:30:17Z

Looks like it isn't possible to make the GC not track Vector2s. as the optimisation in CPython seems specific to dicts and doesn't apply to classes (even though it could be generalised).

nbraud · 2019-03-07T19:42:14Z

@pathunstrom As far as I understand (I'm no Python expert, though I checked the docs and checked the behaviour in the REPL), inheritance works fine with __slots__, as long as a single parent class uses it, and subclasses may themselves use __slots__ (and only need to specify their own attributes there)

That said, I don't think a potential Vector3 should inherit from Vector2, since:

it cannot be API-compatible with Vector2; for instance, rotate(self: Vector3, angle: float) would make no sense (in 3D you need to specify a rotation axis)
most methods in Vector2 make strong assumptions about being able to instantiate new vectors with 2 parameters; that might also be an issue if I wanted to make a ColoredVector2 class (for instance)

I think we should make a separate issue if we want to keep discussing 3D vectors :)

nbraud · 2019-03-07T19:50:00Z

OK, this should be good to go (though sadly without GC optimisations)

ppb_vector/vector2.py

AstraLuma · 2019-03-09T05:12:01Z

The performance metrics are a lot more dramatic than I was expecting.

This will also need a matching change in ppb (after this has been released to PyPI), otherwise __dict__ comes back and we've lost a lot of the benefit.

When used with a version of ppb-vector that defines __slots__, this should dramatically reduce memory usage. When used with an earlier version of ppb-vector __dict__ is accessible, making the __slots__ declaration a no-op. See ppb/ppb-vector#115

When used with a version of ppb-vector that defines `__slots__`, this dramatically reduces memory usage. When used with an earlier version of ppb-vector, `__dict__` is accessible, making the change a no-op. See ppb/ppb-vector#115

tests/test_memory.py

This makes instances more than 3× smaller (from 328 to 104 bytes, as reported by pympler), and faster to access. Comparison with master as of caa4ef0: $ python3 -m perf compare_to benchmark_*.json --table +-----------+------------------+------------------------------+ | Benchmark | benchmark_master | benchmark_slots | +===========+==================+==============================+ | __add__ | 4.83 us | 2.49 us: 1.94x faster (-48%) | +-----------+------------------+------------------------------+ | __eq__ | 914 ns | 495 ns: 1.85x faster (-46%) | +-----------+------------------+------------------------------+ | convert | 468 ns | 326 ns: 1.44x faster (-30%) | +-----------+------------------+------------------------------+ | normalize | 9.45 us | 5.81 us: 1.63x faster (-39%) | +-----------+------------------+------------------------------+ | length | 451 ns | 662 ns: 1.47x slower (+47%) | +-----------+------------------+------------------------------+ | rotate | 4.20 us | 6.56 us: 1.56x slower (+56%) | +-----------+------------------+------------------------------+ | scale_by | 2.11 us | 1.58 us: 1.33x faster (-25%) | +-----------+------------------+------------------------------+ | scale_to | 5.10 us | 7.08 us: 1.39x slower (+39%) | +-----------+------------------+------------------------------+ Not significant (7): __sub__; reflect; angle; dot; isclose; __neg__; truncate

tests/memory: Updated as Vector2 instances are now a tiny bit larger

nbraud · 2019-03-15T18:48:31Z

@astronouth7303 Rebased to handle the merge conflict.

nbraud force-pushed the slots branch from 7da30b2 to 593e5d3 Compare March 7, 2019 19:27

nbraud changed the title ~~WiP: Optimize the memory behaviour of Vector2~~ Optimize the memory behaviour of Vector2 Mar 7, 2019

nbraud force-pushed the slots branch from 593e5d3 to 7a6e510 Compare March 7, 2019 19:43

AstraLuma requested changes Mar 9, 2019

View reviewed changes

ppb_vector/vector2.py Outdated Show resolved Hide resolved

nbraud mentioned this pull request Mar 15, 2019

ppb/vector: Define __slots__ ppb/pursuedpybear#213

Merged

AstraLuma requested changes Mar 15, 2019

View reviewed changes

tests/test_memory.py Outdated Show resolved Hide resolved

nbraud force-pushed the slots branch from 5d23998 to 114b06e Compare March 15, 2019 18:45

nbraud added 4 commits March 15, 2019 19:46

Test that the use of __slots__ indeed reduces memory usage

5872c74

tests/memory: Assert that weak reference to Vector2s can be made

c7a946b

Vector2: Support use of weak references

525534f

tests/memory: Updated as Vector2 instances are now a tiny bit larger

nbraud force-pushed the slots branch from 114b06e to 525534f Compare March 15, 2019 18:47

AstraLuma approved these changes Mar 15, 2019

View reviewed changes

AstraLuma merged commit 8d0b663 into ppb:master Mar 15, 2019

nbraud deleted the slots branch March 15, 2019 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the memory behaviour of Vector2 #115

Optimize the memory behaviour of Vector2 #115

nbraud commented Mar 6, 2019 •

edited

Loading

nbraud commented Mar 6, 2019

pathunstrom commented Mar 6, 2019

nbraud commented Mar 7, 2019

nbraud commented Mar 7, 2019

nbraud commented Mar 7, 2019

AstraLuma commented Mar 9, 2019

nbraud commented Mar 15, 2019

Optimize the memory behaviour of Vector2 #115

Optimize the memory behaviour of Vector2 #115

Conversation

nbraud commented Mar 6, 2019 • edited Loading

Rationale

Expected effects

nbraud commented Mar 6, 2019

pathunstrom commented Mar 6, 2019

nbraud commented Mar 7, 2019

nbraud commented Mar 7, 2019

nbraud commented Mar 7, 2019

AstraLuma commented Mar 9, 2019

nbraud commented Mar 15, 2019

nbraud commented Mar 6, 2019 •

edited

Loading