Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A few tweaks to the cast table implementation.
Nothing changed algorithmically. These are small tweaks to help compilers emit better code.
tableData
(first element) instead of the whole table when iterating.hashShift
is stored at 0 offset off thetableData
. (simpler address math in the most common path)I see better codegen in both managed and C++ code. It results in 12% improvements on directed microbenchmarks such as invoking a method that casts
List<string>
toIReadOnlyCollection<object>
in a loop.(Ex: 200000000 iterations changes from 894ms to 794ms )
This is not enough to switch ordinary interface and class casts to use cache lookup. Linear scan of interfaces is still faster, at least for common cases involving < 4-6 interfaces. Same goes for looking through bases.
This is still an improvement.
Other things tried (unsuccessfully):
none provided a noticeable gain while collisions generally increased. Current hash seems to be very good for the data in use.
provided some gains (5% or so depending on scenario). But fixed size implies the max size right from the start. The gains are not big enough, IMO, to justify that.
Considered:
This would make the table 1.5x denser, since no need for an extra version field and would eliminate any need for synchronization. This is, however, only feasible on 64bit. 32bit would need a separate implementation. I have doubts that gains here would be big, especially big enough to justify the trouble of dual implementations.