Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Implement struct marshalling via IL Stubs instead of via FieldMarshalers #26340

Merged
merged 6 commits into from
Oct 24, 2019

Conversation

jkoritzinsky
Copy link
Member

@jkoritzinsky jkoritzinsky commented Aug 23, 2019

Currently, our system for marshalling fields of structures between managed and native code is completely separate from our system for marshalling parameters or return values, even though most of the code in the two systems can be shared. This PR unifies the two systems by removing the field marshalers in favor of using IL stubs and a new NativeFieldDescriptor concept which is 9 bytes smaller then the old FieldMarshalers.

Perf numbers:

CoreCLR.dll's size is reduced by ~30kB on Windows x64.

I wrote some microbenchmarks to benchmark marshalling with various types of structs:

Struct Definitions
public class Common
{
    public const int NumArrElements = 2;
}
//////////////////////////////struct definition///////////////////////////
[StructLayout(LayoutKind.Sequential)]
public struct InnerSequential
{
    public int f1;
    public float f2;
    public string f3;
}

[StructLayout(LayoutKind.Sequential)]//struct containing one field of array type
public struct InnerArraySequential
{
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = Common.NumArrElements)]
    public InnerSequential[] arr;
}


public struct HFA
{
    public float f1;
    public float f2;
    public float f3;
    public float f4;
}

[StructLayout(LayoutKind.Sequential)]
public unsafe struct FixedBufferClassificationTest
{
    public fixed int arr[3];
    public NonBlittableFloat f;
}

// A non-blittable wrapper for a float value.
// Used to force a type with a float field to be non-blittable
// and take a different code path.
[StructLayout(LayoutKind.Sequential)]
public struct NonBlittableFloat
{
    public NonBlittableFloat(float f)
    {
        arr = new []{f};
    }

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 1)]
    private float[] arr;
    
    public float F => arr[0];
}

[StructLayout(LayoutKind.Sequential)]
public struct S8
{
    public string name;
    public bool gender;
    [MarshalAs(UnmanagedType.Error)]
    public int i32;
    [MarshalAs(UnmanagedType.Error)]
    public uint ui32;
    [MarshalAs(UnmanagedType.U2)]
    public ushort jobNum;
    [MarshalAs(UnmanagedType.I1)]
    public sbyte mySByte;
}

The following are the perf numbers I got on Windows x64: (CoreRun Master is a local Release build of CoreCLR on commit 402af7b.)

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18970
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview9-014004
  [Host]     : .NET Core 3.0.0-preview9-19423-09 (CoreCLR 4.700.19.42102, CoreFX 4.700.19.42104), 64bit RyuJIT
  Job-EFUBVD : .NET Core ? (CoreCLR 5.0.19.42001, CoreFX 4.700.19.35605), 64bit RyuJIT
  Job-WIAUZG : .NET Core ? (CoreCLR 5.0.19.45401, CoreFX 5.0.19.42613), 64bit RyuJIT

Method Toolchain Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
S8ByValue CoreRun Master 13,962.77 ns 212.9081 ns 199.1543 ns 1.00 0.00 - - - -
S8ByValue CoreRun Struct IL Stubs 12,518.12 ns 151.9199 ns 142.1060 ns 0.90 0.02 - - - -
InnerArraySequentialByValue CoreRun Master 27,112.46 ns 372.1051 ns 348.0673 ns 1.00 0.00 - - - -
InnerArraySequentialByValue CoreRun Struct IL Stubs 25,386.27 ns 503.3039 ns 446.1658 ns 0.94 0.02 - - - -
FixedBufferClassificationTestByValue CoreRun Master 144.83 ns 2.4056 ns 2.2502 ns 1.00 0.00 - - - -
FixedBufferClassificationTestByValue CoreRun Struct IL Stubs 77.08 ns 0.7635 ns 0.6768 ns 0.53 0.01 - - - -

The allocation in InnerArraySequentialByValue is a System.RuntimeMethodInfoStub allocated by the JIT_GetRuntimeMethodStub helper for implementing the ldtoken instruction that loads the token for the nested InnerSequential struct IL stub onto the stack in the InnerArraySequential struct IL stub.

In addition to normal CoreCLR testing, I've also run the WinForms integration test suite with this local build of CoreCLR to validate that it doesn't break upstream. I've also run the struct marshalling tests with GCStress modes 3 and C.

@stephentoub
Copy link
Member

The following are the perf numbers I got on Windows x64

What is the perf test actually doing?

The allocation in InnerArraySequentialByValue is a System.RuntimeMethodInfoStub allocated by the JIT_GetRuntimeMethodStub helper

Why is that an acceptable regression?

@jkoritzinsky
Copy link
Member Author

The perf test is calling a PInvoke passing the struct by value. I'll post the benchmark on Monday.

Re the allocation: I'm planning on trying to figure out how to cache the allocated object so we only allocate at most once per stub instead of once per stub call. Additionally, the allocation only happens when marshalling an array of non-blittable structures, which in that case already has allocations, so I felt it was still worth putting this up as a draft to get some full CI runs on it before figuring out how to remove the single allocation.

@stephentoub
Copy link
Member

so I felt it was still worth putting this up as a draft to get some full CI runs on it before figuring out how to remove the single allocation.

Ok. Thanks.

@jkoritzinsky
Copy link
Member Author

Here's the benchmark (using the structs mentioned in the original post. The native library is the native component of the StructMarshaling/PInvoke test.

Benchmark
namespace InteropBenchmarks
{
    [MemoryDiagnoser]
    public class StructMarshalling
    {
        private float f;
        private S8 s8;

        private InnerArraySequential ias;
        private FixedBufferClassificationTest fb;

        [GlobalSetup]
        public void Setup()
        {
            f = MarshalStructAsParam.ProductHFA(default);
            s8 = Helper.NewS8("hello", true, 10, 128, 128, 32);
            ias = Helper.NewInnerArraySequential(1, 1.0F, "some string");
            fb = Helper.NewFixedBuffer(42.0f);
        }

        [Benchmark]
        public bool S8ByValue()
        {
            return MarshalStructAsParam.MarshalStructAsParam_AsSeqByVal11(s8);
        }

        [Benchmark]
        public bool InnerArraySequentialByValue()
        {
            return MarshalStructAsParam.MarshalStructAsParam_AsSeqByVal2(ias);
        }
        [Benchmark]
        public bool FixedBufferClassificationTestByValue()
        {
            return MarshalStructAsParam.MarshalStructAsParam_AsSeqByValFixedBufferClassificationTest(fb, fb.f.F);
        }
    }
}

class MarshalStructAsParam
{

    [DllImport(nameof(MarshalStructAsParam))]
    public static extern float ProductHFA(HFA hfa);
    [DllImport(nameof(MarshalStructAsParam))]
    public static extern bool MarshalStructAsParam_AsSeqByVal11(S8 str);
    [DllImport(nameof(MarshalStructAsParam))]
    public static extern bool MarshalStructAsParam_AsSeqByVal2(InnerArraySequential str1);
    [DllImport(nameof(MarshalStructAsParam))]
    public static extern bool MarshalStructAsParam_AsSeqByValFixedBufferClassificationTest(FixedBufferClassificationTest str, float f);
}

@jkoritzinsky
Copy link
Member Author

/azp run coreclr-ci

@azure-pipelines
Copy link

Pull request contains merge conflicts.

@jkoritzinsky
Copy link
Member Author

I've updated the perf benchmarks with the current result. I've removed the allocation from each iteration by caching it in the MethodDesc's LoaderAllocator (which has the same maximum lifetime as the allocated object).

@jkoritzinsky
Copy link
Member Author

Perf numbers updated for commit 034db68 where I've changed the ldtoken StructStub to ldftn StructStub and removed the RuntimeMethodInfoStub caching.

@jkoritzinsky jkoritzinsky marked this pull request as ready for review September 11, 2019 20:47
@jkoritzinsky
Copy link
Member Author

@AaronRobinsonMSFT can you make a review pass when you have a chance?

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am half way through at dllimport.h.

int numChars = strManaged.Length;
if (numChars >= length)
{
numChars = length - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a doc reference for the logic where we apply a guaranteed null terminator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we have a doc reference for the auto-null-terminator logic.

src/vm/dllimport.cpp Outdated Show resolved Hide resolved
src/vm/dllimport.cpp Outdated Show resolved Hide resolved
src/vm/dllimport.cpp Outdated Show resolved Hide resolved
src/vm/dllimport.cpp Show resolved Hide resolved
src/vm/dllimport.cpp Outdated Show resolved Hide resolved
@AaronRobinsonMSFT
Copy link
Member

@jkoritzinsky Can you squash these commits and rebase? Some of the tooling I have only supports 64 commits for review :-/

NewArrayHolder<CompileTimeState> pNewCompileTimeState = NULL;
if (!UseLoaderHeap())
{
NewArrayHolder<BYTE> pNewILCodeBuffer = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pull these command variables outside the branch. I don't see why they need to be different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason the branching here is weird is because we want to be able to use the holders for cleanup in the exceptional/failure case. However, when we don't need to use the loader heap, we allocate via "new" and delete in the ClearCompileTimeState method. If we use the loader heap, we want to allocate from the loader heap and use the appropriate holder and only delete when the loader is unloaded (not in ClearCompileTimeState). If we don't keep the data after ClearCompileTimeState, the JIT will assert when trying to JIT a P/Invoke or struct marshal stub that calls this struct marshal stub.

It's a really awkward case where RAII doesn't quite do what we want unless we create a "multi-holder" type from which we could pick which holder from a set we want to use.

Remove dead code for calculating managed field size.

Add field IL marshal infra up to MarshalInfo::GenerateFieldIL.

Add preliminary changes for MarshalInfo ctor to have the logic handle field marshalling logic. Still need to handle WinRT struct field logic correctly.

First shot at handling fields of WinRT structs.

Cleanup

Clean up entrypoints

Fix cleanup marshal emit.

Disable specific paths on struct marshal stubs.

Implement emitting full struct marshalling stub.

Add StructMarshalInteropStub method name.

Add NDirect::CreateMarshalILStub

Get byvalue StructMarshalling/PInvoke tests passing excluding missing ILMarshalers (ByValArray and ByValTStr).

Correctly classify struct marshal stubs as struct marshal stubs instead of PInvoke stubs.

Implement UnmanagedType.ByValArray IL marshaler.

Implement ILMarshaler equivalent for ansi-char fixed arrays.

Fix parameter mismatch in Native->CLR direction for struct marshalling.

Implement ByValTStr marshalling. Support unaligned fields in IL stubs.

Load CleanupWorkList from param list if in a struct marshalling stub

Implement SafeHandle and CriticalHandle field marshalling in IL struct stubs

Fix handle field marshalers. Add error reporting in struct field IL marshalers consistent with old FieldMarshaler error reporting.

Convert Array-of-nonblittable-struct marshalling to use IL stubs.

Convert LayoutClass marshalling to use IL stubs.
Fix marshalling of LayoutClass fields in structs.

Add non-blittable fixed buffer assert in the struct IL stub path.

Implement Marshal APIs via the IL stubs.

Fix default char marshaler selection.

Move hidden-length-array marshalling over to struct marshalling stubs.

Convert struct marshal IL stub users to use helper that will automatically cleanup on failure to marshal.

Match MarshalInfo::MarshalInfo behavior to ParseNativeType for fields.

Remove old FieldMarshaler-based marshalling.

Fix signed/unsigned mismatch.

Fix IsFieldScenario on non-COMINTEROP plaforms

Fix off-Windows build.

Handle automatic partial cleanup of struct marshaling entirely within the struct stub.

Remove now-unused ValueClassMarshaler. Move DateMarshaler to managed since it doesn't need to be an FCall.

Error out on recursive struct definitions in the IL stub generation as we did in the field marshalling.

Remove FieldMarshalers and replace with a significantly simpler structure (NativeFieldDescriptor) that stores only the needed info for determining validity in WinRT, blittability, and calling convention classification.

This will save 4/8 bytes for every field in a non-auto-layout structure or class loaded into the runtime.

Add explicit test for recursive native layout.

Allow marshalling  as UnmanagedType.Error on all platforms (on non-Windows the behavior matches UnmanagedType.I4).

Collapse common primitive marshalling cases together.

Disable WinRT parameter/retval-only marshalers in field scenarios.

Revert "Collapse common primitive marshalling cases together."

This reverts commit e73b78a.

Fix error marshalling off Windows for uint.

Disable LPStruct marshalling in structs.

Disable copy-constructor marshaler in the field scenario.

Match error messages between MarshalInfo::MarshalInfo and ParseNativeType in the field scenario.

Refactor managed-sequential check out of ParseNativeType.

Remove invalid MARSHAL_TYPE_GENERIC_U8 references.

Add override specifier.

Change ParseNativeType to use MarshalInfo::MarshalInfo to calculate field marshalling info instead of maintaining two native field classification functions.

Clean up native field categories. Remove nsenums.h since it is now unused.

Move CheckIfDisqualifiedFromManagedSequential to class.cpp.

Encapsulate stub flags for struct stubs. Read the BestFitAttribute once for the type instead of per field.

Fix perf regression in by-val arrays of non-blittable structures by caching the MethodDesc* to the struct stub in the "managed marshaler" structure. Now we have a perf improvement!

Fix memory leak in sig creation.

Keep compile-time information for struct stubs around as long as the owning loader allocator (instead of leaking).

Allocate the signature on the same heap as the IL stubs so it shares the same lifetime.

Fix build with fragile NGen support enabled so as to not break partners.

Add missing native field descriptors.

Fix clang build.

Only assert if we're emitting IL (so we don't assert during type-load).

Determine desciptor for pointer-sized fields based on target not host pointer size.

Don't emit IL stubs that call struct stubs into R2R images since there's not a good way for us to emit the token.

Fix tracing test failures.

Force field marshaling to not stackalloc.

Cache Sytem.RuntimeMethodInfoStub instances created in the VM in the MethodDesc's owning LoaderAllocator.

Struct marshal stubs don't have an MethodDesc context arg.

Copy FieldDesc on NFD assignment.

Fix initialization of stubMethodInfoCache lock owner.

Fix alignment calculation of decimal fields and decimal array fields in NFDs.

Fix Crst leveling.

Enable handling decimal-as-currency fields accurately off-Windows.

Fix deadlock where two threads are both trying to generate an IL stub for the same P/Invoke and one of the parameters needs a struct stub generated.

Fix incorrect check for if we need a struct marshal stub in some of the variant/array cases.

We never need to promote a field value to 8 bytes.

Fix issue with recursive pointer fields.

Shortcut blittable types in stubhelpers.

Use LDFTN + PREPARE_NONVIRUTAL_CALLSITE_USING_CODE instead of LDTOKEN + GetInternalToken.

Revert "Fix Crst leveling."

This reverts commit 1d8e56e.

Revert "Fix initialization of stubMethodInfoCache lock owner."

This reverts commit a095390.

Revert "Cache Sytem.RuntimeMethodInfoStub instances created in the VM in the MethodDesc's owning LoaderAllocator."

This reverts commit 7266538.

Fix case where struct marshal stub is unused in native-only mashalling paths.

PR Feedback.

Clean up terenary statement in dispatchinfo.cpp

Cleanup ILStubResolver::AllocGeneratedIL a little bit.
@jkoritzinsky
Copy link
Member Author

I've rebased this on master. There were a lot of conflicts from a few months ago that I had to re-resolve as part of that, so a mistake may have snuck in that I didn't catch locally.

@@ -1301,7 +1304,7 @@ void OleVariant::MarshalBoolVariantOleToCom(VARIANT *pOleVariant,
#endif // FEATURE_COMINTEROP

void OleVariant::MarshalBoolArrayOleToCom(void *oleArray, BASEARRAYREF *pComArray,
MethodTable *pInterfaceMT)
MethodTable *pInterfaceMT, PCODE pManagedMarshalerCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pManagedMarshalerCode [](start = 75, length = 21)

It seems very few of these functions are using this parameter. What is the reason for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are passed up to the calling code through the Marshaler struct in olevariant.h. The design there is to have each field be a function pointer that points to each needed marshalling method. As a result, they all need to have the same signature, even if one of the parameters (such as this one) is unused. It's the same reason that every marshaler in olevariant.cpp has a MethodTable* parameter when only the interface marshaler uses it.

We can't resolve the struct marshalling stub from the MethodTable* in the marshaler since the lookup is slow enough to slow down non-blittable structure array marshalling by at least 600%.

src/vm/mtypes.h Outdated Show resolved Hide resolved
@AaronRobinsonMSFT
Copy link
Member

MarshalInfo(Module* pModule,

Please create a new issue to reduce this argument list. It is close to impossible to understand at instantiation time what all the bools and constants mean. We have && and can now have high efficiency struct passing that can describe what constants are meant to define.


Refers to: src/vm/mlinfo.h:456 in 59bb717. [](commit_id = 59bb717, deletion_comment = False)

@AaronRobinsonMSFT
Copy link
Member

BOOL IsWinRTScenario()

Any chance you could keep the function but have it always return FALSE when FEATURE_COMINTEROP isn't defined? Would remove a lot of the various FEATURE_COMINTEROP checks in mlinfo.cpp.


Refers to: src/vm/mlinfo.h:675 in 59bb717. [](commit_id = 59bb717, deletion_comment = False)

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally okay with this, but do have some concerns I would like to talk about off-line before committing.

@jkoritzinsky
Copy link
Member Author

BOOL IsWinRTScenario()

Any chance you could keep the function but have it always return FALSE when FEATURE_COMINTEROP isn't defined? Would remove a lot of the various FEATURE_COMINTEROP checks in mlinfo.cpp.

Refers to: src/vm/mlinfo.h:675 in 59bb717. [](commit_id = 59bb717, deletion_comment = False)

I've tried to do this and it only really allows us to remove the FEATURE_COMINTEROP defines around the error cases in four places. The majority of usages have to stay covered since we don't define the WinRT marshalers when FEATURE_COMINTEROP is turned off. Since most usages still have to be covered with ifdefs, I think we might as well leave it as is.

@jkoritzinsky
Copy link
Member Author

MarshalInfo(Module* pModule,

Please create a new issue to reduce this argument list. It is close to impossible to understand at instantiation time what all the bools and constants mean. We have && and can now have high efficiency struct passing that can describe what constants are meant to define.

Refers to: src/vm/mlinfo.h:456 in 59bb717. [](commit_id = 59bb717, deletion_comment = False)

Submitted the issue. Link: https://github.com/dotnet/coreclr/issues/27399

@jkoritzinsky
Copy link
Member Author

@jkotas do you have any feedback on this PR?

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I want to get this in as early as possible for consumption by as many stakeholders as we can. This can have great benefit to us and teams like WinForms, but is also a concern due to backwards compat. I see the care taken in merging the two methods, so thank you.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not reviewed this in detail, but it looks good to me. Anything that removes 3000 lines without breaking any tests must be good :-)

@jkoritzinsky
Copy link
Member Author

Linux_musl x64 failure is #26057

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants