Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed TensorSpan and ReadOnlyTensorSpan layout for better performance. #103244

Merged
merged 7 commits into from
Jun 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
</ItemGroup>

<ItemGroup Condition="'$(TargetFrameworkIdentifier)' == '.NETCoreApp'">
<Compile Include="System\Numerics\Tensors\netcore\TensorShape.cs" />
<Compile Include="System\Numerics\Tensors\netcore\TensorHelpers.cs" />
<Compile Include="System\Numerics\Tensors\netcore\TensorExtensions.cs" />
<Compile Include="System\Numerics\Tensors\netcore\Tensor.Factory.cs" />
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -153,16 +153,16 @@ public static Tensor<T> CreateUninitialized<T>(scoped ReadOnlySpan<nint> lengths

public static ref readonly TensorSpan<T> FillGaussianNormalDistribution<T>(in TensorSpan<T> destination) where T : IFloatingPoint<T>
{
Span<T> span = MemoryMarshal.CreateSpan<T>(ref destination._reference, (int)destination._flattenedLength);
Span<T> span = MemoryMarshal.CreateSpan<T>(ref destination._reference, (int)destination._shape._memoryLength);

GaussianDistribution<T>(span, destination._flattenedLength);
GaussianDistribution<T>(span, destination._shape._memoryLength);

return ref destination;
}

public static ref readonly TensorSpan<T> FillUniformDistribution<T>(in TensorSpan<T> destination) where T : IFloatingPoint<T>
{
Span<T> span = MemoryMarshal.CreateSpan<T>(ref destination._reference, (int)destination._flattenedLength);
Span<T> span = MemoryMarshal.CreateSpan<T>(ref destination._reference, (int)destination._shape._memoryLength);

for (int i = 0; i < span.Length; i++)
span[i] = T.CreateChecked(Random.Shared.NextDouble());
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ internal static class TensorHelpers
/// <returns>How many boolean values are true.</returns>
public static nint CountTrueElements(scoped in ReadOnlyTensorSpan<bool> filter)
{
Span<bool> filterSpan = MemoryMarshal.CreateSpan(ref filter._reference, (int)filter._flattenedLength);
Span<bool> filterSpan = MemoryMarshal.CreateSpan(ref filter._reference, (int)filter._shape._memoryLength);
nint count = 0;
for (int i = 0; i < filterSpan.Length; i++)
{
Expand Down Expand Up @@ -83,11 +83,14 @@ internal static nint[] GetIntermediateShape(ReadOnlySpan<nint> shape1, int shape
return newShape;
}

internal static bool IsUnderlyingStorageSameSize<T>(scoped in ReadOnlyTensorSpan<T> tensor1, scoped in ReadOnlyTensorSpan<T> tensor2)
=> tensor1._shape._memoryLength == tensor2._shape._memoryLength;

internal static bool IsUnderlyingStorageSameSize<T>(Tensor<T> tensor1, Tensor<T> tensor2)
=> tensor1.Lengths.Length == tensor2.Lengths.Length;
=> tensor1._values.Length == tensor2._values.Length;

internal static bool AreLengthsTheSame<T>(ReadOnlyTensorSpan<T> tensor1, ReadOnlyTensorSpan<T> tensor2)
=> tensor1._lengths.SequenceEqual(tensor2._lengths);
internal static bool AreLengthsTheSame<T>(scoped in ReadOnlyTensorSpan<T> tensor1, scoped in ReadOnlyTensorSpan<T> tensor2)
=> tensor1.Lengths.SequenceEqual(tensor2.Lengths);

internal static bool AreLengthsTheSame(ReadOnlySpan<nint> lengths1, ReadOnlySpan<nint> lengths2)
=> lengths1.SequenceEqual(lengths2);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace System.Numerics.Tensors
{
internal readonly struct TensorShape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be a ref struct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to be able to use the same struct as part of Tensor<T> so we can avoid additional allocations for common cases there as well.

{
// Used to determine when we need to allocate a metadata array
public const int MaxInlineArraySize = 5;

// Used to determine when we can stack alloc for indexing vs when we need to allocate
public const int MaxInlineRank = 8;

internal readonly nint[]? _metadata; // 8 bytes

internal readonly nint _memoryLength; // 8 bytes
Copy link
Member

@eiriktsarpalis eiriktsarpalis Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the layout and size concerns of this change, does the new type require an explicit StructLayoutAttribute annotation? Are we ok with using the default layout? Is there an expectation that the fields will be consumed by unmanaged code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct contains a managed field (nint[]? _metadata) and as such is not blittable and will never have StructLayout respected, the runtime will treat it as Auto regardless (you can technically mark it as LayoutKind.Explicit and that would be respected, but that's a de-optimization and would cause many other issues).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct contains a managed field (nint[]? _metadata) and as such is not blittable and will never have StructLayout respected, the runtime will treat it as Auto regardless (you can technically mark it as LayoutKind.Explicit and that would be respected, but that's a de-optimization and would cause many other issues).

Does Mono behave the same? I think it might respect Sequential for managed types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Mono behave the same? I think it might respect Sequential for managed types.

That doesn't really matter. RyuJIT doesn't and so code cannot depend on the managed side being sequential, that is part of why we must use InlineArray (rather than declaring n-sequential fields).

This is likewise an internal struct, so it can't be used by developers in interop scenarios. Due to it being a struct containing a managed field, it likewise couldn't be used in interop without marshalling and so there is zero benefit to changing the layout from its default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Mono behave the same? I think it might respect Sequential for managed types.

That doesn't really matter. RyuJIT doesn't and so code cannot depend on the managed side being sequential, that is part of why we must use InlineArray (rather than declaring n-sequential fields).

This is likewise an internal struct, so it can't be used by developers in interop scenarios. Due to it being a struct containing a managed field, it likewise couldn't be used in interop without marshalling and so there is zero benefit to changing the layout from its default.

Yeah but I've meant that Auto could maybe still be beneficial for Mono.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono should likely consider adjusting their layout algorithm to match the RyuJIT algorithm in that scenario.

There are thousands of structs across the BCL and we are not going to go and annotate every single one of them to be explicitly Auto because they contain some managed field.

internal readonly int _rank; // 4 bytes

private readonly NintBuffer _lengths;
private readonly NintBuffer _strides;

internal TensorShape(nint memoryLength, ReadOnlySpan<nint> lengths, ReadOnlySpan<nint> strides)
{
_memoryLength = memoryLength;
_rank = lengths.Length;
if (lengths.Length > MaxInlineArraySize)
{
_metadata = new nint[lengths.Length + strides.Length];
lengths.CopyTo(MemoryMarshal.CreateSpan(ref _metadata[0], lengths.Length));
strides.CopyTo(MemoryMarshal.CreateSpan(ref _metadata[lengths.Length], strides.Length));
}
else
{
lengths.CopyTo(_lengths);
strides.CopyTo(_strides);
}
}

[InlineArray(MaxInlineArraySize)] // 5x8 bytes (40)
private struct NintBuffer
{
public nint e0;
}

[UnscopedRef]
public ReadOnlySpan<nint> Lengths => (_metadata is null)
? ((ReadOnlySpan<nint>)_lengths).Slice(0, _rank)
: MemoryMarshal.CreateReadOnlySpan(ref MemoryMarshal.GetArrayDataReference(_metadata), _rank);

[UnscopedRef]
public ReadOnlySpan<nint> Strides => (_metadata is null)
? ((ReadOnlySpan<nint>)_strides).Slice(0, _rank)
: MemoryMarshal.CreateReadOnlySpan(ref MemoryMarshal.GetArrayDataReference(_metadata), _rank * 2).Slice(_rank);

public nint FlattenedLength => TensorSpanHelpers.CalculateTotalLength(Lengths);

public bool IsEmpty => FlattenedLength == 0;
}
}
Loading
Loading