Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-45565: Specialize LOAD_ATTR_CLASS #29146

Closed
wants to merge 6 commits into from

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Oct 22, 2021

Almost same as LOAD_METHOD_CLASS.

https://bugs.python.org/issue45565

@Fidget-Spinner
Copy link
Member Author

Stats for python -m test_typing test_re test_dis test_zlib.

Before:
    load_attr.specialization_success : 594
    load_attr.specialization_failure : 8409
    load_attr.hit : 1058930
    load_attr.deferred : 496229
    load_attr.miss : 8925
    load_attr.deopt : 124
    load_attr.unquickened : 11868

After adding LOAD_METHOD_CLASS:
    load_attr.specialization_success : 718
    load_attr.specialization_failure : 8121
    load_attr.hit : 1068789
    load_attr.deferred : 484066
    load_attr.miss : 11229
    load_attr.deopt : 149
    load_attr.unquickened : 11868

@markshannon
Copy link
Member

Do you have stats for the standard benchmark suite (or something of similar scale), and have you benchmarked this?

A couple of things that stand out:

  • For the stats you give, the hits increase by ~10k and the misses increase by ~2k. The cost of a miss can be higher than the benefit of a hit so this might not be a win. Is this an artifact of the test scripts? Would you expect better numbers on "real" programs?
  • The cache use for LOAD_ATTR has increased from 2 to 3. Only 4 bytes of each of the first two cache entries are being used, so this could be reduced back to 2 entries.

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Oct 22, 2021

Do you have stats for the standard benchmark suite (or something of similar scale), and have you benchmarked this?

No, I'll get some soon (pyperformance is a pain on Windows :().

  • For the stats you give, the hits increase by ~10k and the misses increase by ~2k. The cost of a miss can be higher than the benefit of a hit so this might not be a win. Is this an artifact of the test scripts? Would you expect better numbers on "real" programs?

The only possible way for deopt is for tp_version_tag to change, and that requires the class variable to be written to. So things like:

class X:
 x = 1

X.x = 2

Unfortunately, I have no clue how common something like this is in the real world. An alternative approach (with far fewer invalidations):

  1. Store owner.tp_mro tuple ID.
  2. Store the index of the real type we need to look into and where it belongs in owner.tp_mro.
  3. Store dict hint of real type.__dict__ and dk version..

At runtime, look into owner.tp_mro[mro_index].__dict__ + hint to get our attribute.

The benefit is that tp_version_tag invalidates every time a write occurs, but we don't care about that since the actual index in the dict doesn't change.

  • The cache use for LOAD_ATTR has increased from 2 to 3. Only 4 bytes of each of the first two cache entries are being used, so this could be reduced back to 2 entries.

Indeed, I failed to see that _PyAdaptiveEntry still had 4 bytes of unused space, so we can pack tp_version_tag into there if we go with the old approach.

@Fidget-Spinner
Copy link
Member Author

Wow, somehow the stats for the alternative approach (mentioned above) is almost exactly the same as the old one using tp_version_tag 😮 . I clearly don't know enough about how types work in Python, or the test suite is doing some weird stuff.

Python/ceval.c Outdated Show resolved Hide resolved
@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Oct 28, 2021

Some comments:
the method cache/tp_version_tag people were very smart. After 8 tries in the following loop, the method cache gives up and tp_version_tag remains 0 to prevent cache thrashing.

class X:
 x = 1

for _ in range(20):
 X.x
 print(_testcapi.type_get_version(X))
 X.x = None

However, approach 2 using MRO index works even with the test case above, some stats for the code:

class X:
 x = 1

for _ in range(100000):
 X.x
 X.x = None

# Approach 1, using tp_version_tag
    load_attr.specialization_success : 903
    load_attr.specialization_failure : 72
    load_attr.hit : 2342
    load_attr.deferred : 54713
    load_attr.miss : 45117
    load_attr.deopt : 848
    load_attr.unquickened : 912

# Approach 2, using MRO index, no tp_version_tag
    load_attr.specialization_success : 56  (this stat is wonky)
    load_attr.specialization_failure : 72
    load_attr.hit : 101487
    load_attr.deferred : 505
    load_attr.miss : 180
    load_attr.deopt : 1
    load_attr.unquickened : 912

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Nov 28, 2021
@iritkatriel
Copy link
Member

@Fidget-Spinner Is this abandoned, or are you planning to continue working on it?

@Fidget-Spinner
Copy link
Member Author

@iritkatriel this was completed in #93430). Thanks for the reminder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants