Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic overloads to TensorPrimitives #94555

Merged
merged 3 commits into from
Jan 3, 2024

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Nov 9, 2023

This overhauls the implementation and tests to have a generic overload for each existing float-based overload. I've avoided touching the core logic, but have augmented the structure in a few ways, e.g. only taking vectorized code paths when the type supports vectorization. To keep the shared definitions of the float-based APIs, on .NET 9 they delegate to shims that are implemented on top of the generic variants.

The tests have all been made instance members, with an abstract base class containing most of the tests, and calling into abstract methods for the core operations and validation routines. Derived types then fill in this logic, letting us use all the tests for both the non-generic and generic overloads. Generic tests are validating most of the primitive types that implement the required interfaces.

This does not yet handle the following (and I plan to leave that for us to address subsequently):

  • Provide generic overloads for the IndexOfMin/Max{Magnitude} methods
  • Vectorize the trig-related functions for Ts other than floats

@tannergooding, these don't currently work for byte/sbyte/ushort/short because the "_Small" code for handling inputs smaller than 256/512 bit vectors was written assuming the element type was 4 bytes. What would you recommend we do? I don't think we'll want to extend the current scheme to add all the cases necessary to handle all the possibilities here. Also, for alignment, I'm not sure if what's here still works correctly... do we have tests that exercise unaligned inputs at various alignment positions?

Contributes to #94553

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost assigned stephentoub Nov 9, 2023
@ghost
Copy link

ghost commented Nov 9, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

This overhauls the implementation and tests to have a generic overload for each existing float-based overload. I've avoided touching the core logic, but have augmented the structure in a few ways, e.g. only taking vectorized code paths when the type supports vectorization. To keep the shared definitions of the float-based APIs, on .NET 9 they delegate to shims that are implemented on top of the generic variants.

The tests have all been made instance members, with an abstract base class containing most of the tests, and calling into abstract methods for the core operations and validation routines. Derived types then fill in this logic, letting us use all the tests for both the non-generic and generic overloads. Generic tests are validating most of the primitive types that implement the required interfaces.

This does not yet handle the following (and I plan to lave that for us to address subsequently):

  • Provide generic overloads for the IndexOfMin/Max{Magnitude} methods
  • Vectorize the trig-related functions for Ts other than floats

@tannergooding, these don't currently work for byte/sbyte/ushort/short because the "_Small" code for handling inputs smaller than 256/512 bit vectors was written assuming the element type was 4 bytes. What would you recommend we do? The alignment code is also predicated on 4 byte values, and I don't think we'll want to extend the current scheme to add all the cases necessary to handle all the possibilities here. Do we have tests that exercise unaligned inputs at various alignment positions?

Fixes #94553

Author: stephentoub
Assignees: -
Labels:

area-System.Numerics, new-api-needs-documentation

Milestone: -

@tannergooding
Copy link
Member

do we have tests that exercise unaligned inputs at various alignment positions?

I believe we're testing misalignment, but not all possible (or at least all interesting) misalignments.

The logic to become aligned should already be general purpose and correct. The logic to compute the number of remaining elements should be general-purpose/correct. The switch that handles the remaining elements on the vectorized path should likewise be fine already.

The switch that handles Small (less than a vectors worth) is not general purpose and would need to be expanded. Many modern branch predictors assume a loop will execute at least 6-7 times, so it'd probably be fine to isolate the Small handling to sizeof(T) is 4 or 8.

@stephentoub
Copy link
Member Author

The switch that handles Small (less than a vectors worth) is not general purpose and would need to be expanded. Many modern branch predictors assume a loop will execute at least 6-7 times, so it'd probably be fine to isolate the Small handling to sizeof(T) is 4 or 8.

So to make sure I understand correctly, you mean basically:

if (remaining >= Vector256<T>.Count)
{
    ... // large handling that exists today
}
else if (sizeof(T) is 4 or 8)
{
    ... // small handling that exists today
}
else
{
   ... // new scalar loop
}

?

@tannergooding
Copy link
Member

So to make sure I understand correctly, you mean basically:

Yeah, basically.

@stephentoub stephentoub force-pushed the generictensorprimitives branch from f9d2fb5 to 37455de Compare November 10, 2023 03:43
@stephentoub stephentoub marked this pull request as ready for review November 10, 2023 03:52
@stephentoub
Copy link
Member Author

So to make sure I understand correctly, you mean basically:

Yeah, basically.

For now I've just skipped the vectorization for those sizes. This affects not just the "Small" path but also the remainder handling in the long paths.

@stephentoub
Copy link
Member Author

@radekdoulik, can you help me figure out why this is failing on mono? I suspect the mono vectorization support has some subtle differences. Is it possible mono's implementation of Vector128.Abs is throwing an exception when using signed integers and encountering the min value, e.g. -128 for an sbyte?

@stephentoub
Copy link
Member Author

@tannergooding, could you help review this?

@stephentoub
Copy link
Member Author

@tannergooding, can you review this? Thanks.

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this, Stephen. I recommend merging without the follow-up review from @tannergooding. We can conduct a post-merge review in January and address any new feedback/input.

@MichalPetryka
Copy link
Contributor

Thanks for implementing this, Stephen. I recommend merging without the follow-up review from @tannergooding. We can conduct a post-merge review in January and address any new feedback/input.

Aren't the three failures here related to the changes?

@stephentoub
Copy link
Member Author

Thanks for implementing this, Stephen. I recommend merging without the follow-up review from @tannergooding. We can conduct a post-merge review in January and address any new feedback/input.

Aren't the three failures here related to the changes?

Yes, I still need to address them; that's why I haven't merged it yet.

@tannergooding tannergooding reopened this Jan 2, 2024
@stephentoub stephentoub force-pushed the generictensorprimitives branch from 69716c9 to a3e36a2 Compare January 3, 2024 02:00
This overhauls the implementation and tests to have a generic overload for each existing float-based overload.  I've avoided touching the core logic, but have augmented the structure in a few ways, e.g. only taking vectorized code paths when the type supports vectorization.  To keep the shared definitions of the float-based APIs, on .NET 9 they delegate to shims that are implemented on top of the generic variants.

The tests have all been made instance members, with an abstract base class containing most of the tests, and calling into abstract methods for the core operations and validation routines. Derived types then fill in this logic, letting us use all the tests for both the non-generic and generic overloads.  Generic tests are validating most of the primitive types that implement the required interfaces.

This does not yet:
- Provide generic overloads for the IndexOfMin/Max{Magnitude} methods
- Vectorize the trig-related functions for Ts other than floats
@stephentoub stephentoub force-pushed the generictensorprimitives branch from a3e36a2 to d179655 Compare January 3, 2024 13:59
@stephentoub
Copy link
Member Author

Thanks for implementing this, Stephen. I recommend merging without the follow-up review from @tannergooding. We can conduct a post-merge review in January and address any new feedback/input.

Aren't the three failures here related to the changes?

Yes, I still need to address them; that's why I haven't merged it yet.

The failures are #96443. I've disabled the offending tests on mono.

@stephentoub
Copy link
Member Author

@tannergooding, can you take another look? Thanks.

@stephentoub stephentoub merged commit 7e51126 into dotnet:main Jan 3, 2024
111 checks passed
@stephentoub stephentoub deleted the generictensorprimitives branch January 3, 2024 17:50
@asmirnov82
Copy link

Hi @stephentoub,

Do you have any plans to make some parts of TensorPrimitives underlying implementation public? It can be interfaces (like IUnaryOperator<T>, IBinaryOperator<T> and etc), concreate operator implementations and methods for applying operators on span of different types.

Current generic overloads of existing TensorPrimitives API are great, however they don’t cover all cases. Quite often it’s needed to do math calculation on spans of different data type. Or calculations on spans and scalar value. This functionality for example is required inside the Microsoft.Data.Analysis.DataFrame.

Current TensorPrimitives API for operations on Span and Span forces to convert Span to Span first and it’s extra memory allocation and coping.

In case if underlying IBinaryOperator and invoke functions are public vector widening can be used for this purpose.

Do you think this functionality can be beneficial for .Net 9.0?

@tannergooding
Copy link
Member

@asmirnov82 that's covered by #93217 and the investigation into providing an ISimdVector type.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants