Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the memory behaviour of Vector2 #115

Merged
merged 4 commits into from
Mar 15, 2019
Merged

Conversation

nbraud
Copy link
Collaborator

@nbraud nbraud commented Mar 6, 2019

  • Use pre-allocated __slots__ rather than an implicit __dict__.
  • Inform the GC that Vector2 instances need not be tracked

Rationale

We can expect users of ppb-vector to use very many instances, in part due to the immutable design (though Python's refcounting implementation should immediately collect unreachable vectors, this still creates memory churn), and in part because vectors are an essential datastructure in games development.

Moreover, memory churn and GC interactions can be the source of pauses and jitter, which are a problem for games.

As such, it makes sense to try and minimize the per-instance cost of using Vector2, especially if it comes with no cost in terms of ergonomics, and negligible maintenance burden.

Expected effects

  • The use of __slots__ made each vector use over 3 times less memory.
    This was tested by hand (against the implementation in master) and there is also a test that compares Vector2 against a naive representation of vectors (not based on dataclasses).

  • Using __slots__ should result in marginally-better performance.
    This can be tested with tests/benchmark.py but I cannot run it right now (my laptop is heavily loaded).

  • Removing the need for GC tracking should result in shorter pauses.
    I plan on testing that with the hugs.py example game.

@nbraud
Copy link
Collaborator Author

nbraud commented Mar 6, 2019

I compared to caa4ef0: and we seem to be getting a solid 30-50% faster on average, which is a surprisingly-large improvement.

The outliers are likely glitches, as perf complained a lot about results not being statistically significant (standard deviation larger than the mean...).
That's likely due to my running those benchmarks on my overheating laptop, so It would be super helpful if one of you could reproduce the results on a more reasonable box.

$ python3 -m perf compare_to benchmark_*.json --table
+-----------+------------------+------------------------------+
| Benchmark | benchmark_master | benchmark_slots              |
+===========+==================+==============================+
| __add__   | 4.83 us          | 2.49 us: 1.94x faster (-48%) |
+-----------+------------------+------------------------------+
| __eq__    | 914 ns           | 495 ns: 1.85x faster (-46%)  |
+-----------+------------------+------------------------------+
| convert   | 468 ns           | 326 ns: 1.44x faster (-30%)  |
+-----------+------------------+------------------------------+
| normalize | 9.45 us          | 5.81 us: 1.63x faster (-39%) |
+-----------+------------------+------------------------------+
| length    | 451 ns           | 662 ns: 1.47x slower (+47%)  |
+-----------+------------------+------------------------------+
| rotate    | 4.20 us          | 6.56 us: 1.56x slower (+56%) |
+-----------+------------------+------------------------------+
| scale_by  | 2.11 us          | 1.58 us: 1.33x faster (-25%) |
+-----------+------------------+------------------------------+
| scale_to  | 5.10 us          | 7.08 us: 1.39x slower (+39%) |
+-----------+------------------+------------------------------+

Not significant (7): __sub__; reflect; angle; dot; isclose; __neg__; truncate

@pathunstrom
Copy link
Collaborator

This is a cool bit of benchmarking!

I like the idea of using slots. What happens if/when we make 3d vectors? (Primarily, there's some math I don't fully understand myself that is apparently easier with the single extra degree? You may know the state of that better than I do.)

@nbraud nbraud changed the title WiP: Optimize the memory behaviour of Vector2 Optimize the memory behaviour of Vector2 Mar 7, 2019
@nbraud
Copy link
Collaborator Author

nbraud commented Mar 7, 2019

Looks like it isn't possible to make the GC not track Vector2s. as the optimisation in CPython seems specific to dicts and doesn't apply to classes (even though it could be generalised).

@nbraud
Copy link
Collaborator Author

nbraud commented Mar 7, 2019

@pathunstrom As far as I understand (I'm no Python expert, though I checked the docs and checked the behaviour in the REPL), inheritance works fine with __slots__, as long as a single parent class uses it, and subclasses may themselves use __slots__ (and only need to specify their own attributes there)

That said, I don't think a potential Vector3 should inherit from Vector2, since:

  • it cannot be API-compatible with Vector2; for instance, rotate(self: Vector3, angle: float) would make no sense (in 3D you need to specify a rotation axis)
  • most methods in Vector2 make strong assumptions about being able to instantiate new vectors with 2 parameters; that might also be an issue if I wanted to make a ColoredVector2 class (for instance)

I think we should make a separate issue if we want to keep discussing 3D vectors :)

@nbraud
Copy link
Collaborator Author

nbraud commented Mar 7, 2019

OK, this should be good to go (though sadly without GC optimisations)

ppb_vector/vector2.py Outdated Show resolved Hide resolved
@AstraLuma
Copy link
Member

The performance metrics are a lot more dramatic than I was expecting.

This will also need a matching change in ppb (after this has been released to PyPI), otherwise __dict__ comes back and we've lost a lot of the benefit.

nbraud added a commit to nbraud/pursuedpybear that referenced this pull request Mar 15, 2019
When used with a version of ppb-vector that defines __slots__,
this should dramatically reduce memory usage.

When used with an earlier version of ppb-vector __dict__ is accessible,
making the __slots__ declaration a no-op.

See ppb/ppb-vector#115
nbraud added a commit to nbraud/pursuedpybear that referenced this pull request Mar 15, 2019
When used with a version of ppb-vector that defines `__slots__`,
this dramatically reduces memory usage.

When used with an earlier version of ppb-vector, `__dict__` is accessible,
making the change a no-op.

See ppb/ppb-vector#115
tests/test_memory.py Outdated Show resolved Hide resolved
nbraud added 4 commits March 15, 2019 19:46
This makes instances more than 3× smaller (from 328 to 104 bytes, as reported by
pympler), and faster to access.

Comparison with master as of caa4ef0:

  $ python3 -m perf compare_to benchmark_*.json --table
  +-----------+------------------+------------------------------+
  | Benchmark | benchmark_master | benchmark_slots              |
  +===========+==================+==============================+
  | __add__   | 4.83 us          | 2.49 us: 1.94x faster (-48%) |
  +-----------+------------------+------------------------------+
  | __eq__    | 914 ns           | 495 ns: 1.85x faster (-46%)  |
  +-----------+------------------+------------------------------+
  | convert   | 468 ns           | 326 ns: 1.44x faster (-30%)  |
  +-----------+------------------+------------------------------+
  | normalize | 9.45 us          | 5.81 us: 1.63x faster (-39%) |
  +-----------+------------------+------------------------------+
  | length    | 451 ns           | 662 ns: 1.47x slower (+47%)  |
  +-----------+------------------+------------------------------+
  | rotate    | 4.20 us          | 6.56 us: 1.56x slower (+56%) |
  +-----------+------------------+------------------------------+
  | scale_by  | 2.11 us          | 1.58 us: 1.33x faster (-25%) |
  +-----------+------------------+------------------------------+
  | scale_to  | 5.10 us          | 7.08 us: 1.39x slower (+39%) |
  +-----------+------------------+------------------------------+

  Not significant (7): __sub__; reflect; angle; dot; isclose; __neg__; truncate
tests/memory: Updated as Vector2 instances are now a tiny bit larger
@nbraud
Copy link
Collaborator Author

nbraud commented Mar 15, 2019

@astronouth7303 Rebased to handle the merge conflict.

@AstraLuma AstraLuma merged commit 8d0b663 into ppb:master Mar 15, 2019
@nbraud nbraud deleted the slots branch March 15, 2019 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants