Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CastTable perf tweaks. #34427

Merged
merged 2 commits into from
Apr 20, 2020
Merged

CastTable perf tweaks. #34427

merged 2 commits into from
Apr 20, 2020

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Apr 1, 2020

A few tweaks to the cast table implementation.

Nothing changed algorithmically. These are small tweaks to help compilers emit better code.

  • use a small sentinel table, that never contains elements, for the initial table and for flushing (eliminates null check in Get)
  • reduce indirections and register pressure by operating with a ref to the tableData (first element) instead of the whole table when iterating.
  • the above also ensures that hashShift is stored at 0 offset off the tableData. (simpler address math in the most common path)

I see better codegen in both managed and C++ code. It results in 12% improvements on directed microbenchmarks such as invoking a method that casts List<string> to IReadOnlyCollection<object> in a loop.

(Ex: 200000000 iterations changes from 894ms to 794ms )

This is not enough to switch ordinary interface and class casts to use cache lookup. Linear scan of interfaces is still faster, at least for common cases involving < 4-6 interfaces. Same goes for looking through bases.

This is still an improvement.

Other things tried (unsuccessfully):

  • using simpler hash functions
    none provided a noticeable gain while collisions generally increased. Current hash seems to be very good for the data in use.
  • using fixed size table
    provided some gains (5% or so depending on scenario). But fixed size implies the max size right from the start. The gains are not big enough, IMO, to justify that.

Considered:

  • using "colored" pointers for the source and destination handles. (version is embedded in upper bits of the source/destination values).
    This would make the table 1.5x denser, since no need for an extra version field and would eliminate any need for synchronization. This is, however, only feasible on 64bit. 32bit would need a separate implementation. I have doubts that gains here would be big, especially big enough to justify the trouble of dual implementations.

@VSadov VSadov added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it area-VM-coreclr labels Apr 1, 2020
@dotnet dotnet deleted a comment from Dotnet-GitSync-Bot Apr 3, 2020
@VSadov VSadov changed the title [WIP] CastTable perf tweaks. CastTable perf tweaks. Apr 20, 2020
@VSadov VSadov removed NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it labels Apr 20, 2020
@VSadov VSadov marked this pull request as ready for review April 20, 2020 02:11
@VSadov VSadov requested a review from jkotas April 20, 2020 02:13
@VSadov
Copy link
Member Author

VSadov commented Apr 20, 2020

Thanks!!

@VSadov VSadov merged commit aa5b204 into dotnet:master Apr 20, 2020
@VSadov VSadov deleted the castPerf branch April 20, 2020 02:49
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants