Implement struct marshalling via IL Stubs instead of via FieldMarshalers #26340

jkoritzinsky · 2019-08-23T20:22:37Z

Currently, our system for marshalling fields of structures between managed and native code is completely separate from our system for marshalling parameters or return values, even though most of the code in the two systems can be shared. This PR unifies the two systems by removing the field marshalers in favor of using IL stubs and a new NativeFieldDescriptor concept which is 9 bytes smaller then the old FieldMarshalers.

Perf numbers:

CoreCLR.dll's size is reduced by ~30kB on Windows x64.

I wrote some microbenchmarks to benchmark marshalling with various types of structs:

Struct Definitions

public class Common
{
    public const int NumArrElements = 2;
}
//////////////////////////////struct definition///////////////////////////
[StructLayout(LayoutKind.Sequential)]
public struct InnerSequential
{
    public int f1;
    public float f2;
    public string f3;
}

[StructLayout(LayoutKind.Sequential)]//struct containing one field of array type
public struct InnerArraySequential
{
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = Common.NumArrElements)]
    public InnerSequential[] arr;
}


public struct HFA
{
    public float f1;
    public float f2;
    public float f3;
    public float f4;
}

[StructLayout(LayoutKind.Sequential)]
public unsafe struct FixedBufferClassificationTest
{
    public fixed int arr[3];
    public NonBlittableFloat f;
}

// A non-blittable wrapper for a float value.
// Used to force a type with a float field to be non-blittable
// and take a different code path.
[StructLayout(LayoutKind.Sequential)]
public struct NonBlittableFloat
{
    public NonBlittableFloat(float f)
    {
        arr = new []{f};
    }

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 1)]
    private float[] arr;
    
    public float F => arr[0];
}

[StructLayout(LayoutKind.Sequential)]
public struct S8
{
    public string name;
    public bool gender;
    [MarshalAs(UnmanagedType.Error)]
    public int i32;
    [MarshalAs(UnmanagedType.Error)]
    public uint ui32;
    [MarshalAs(UnmanagedType.U2)]
    public ushort jobNum;
    [MarshalAs(UnmanagedType.I1)]
    public sbyte mySByte;
}

The following are the perf numbers I got on Windows x64: (CoreRun Master is a local Release build of CoreCLR on commit 402af7b.)

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18970
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview9-014004
  [Host]     : .NET Core 3.0.0-preview9-19423-09 (CoreCLR 4.700.19.42102, CoreFX 4.700.19.42104), 64bit RyuJIT
  Job-EFUBVD : .NET Core ? (CoreCLR 5.0.19.42001, CoreFX 4.700.19.35605), 64bit RyuJIT
  Job-WIAUZG : .NET Core ? (CoreCLR 5.0.19.45401, CoreFX 5.0.19.42613), 64bit RyuJIT

Method	Toolchain	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
S8ByValue	CoreRun Master	13,962.77 ns	212.9081 ns	199.1543 ns	1.00	0.00	-	-	-	-
S8ByValue	CoreRun Struct IL Stubs	12,518.12 ns	151.9199 ns	142.1060 ns	0.90	0.02	-	-	-	-

InnerArraySequentialByValue	CoreRun Master	27,112.46 ns	372.1051 ns	348.0673 ns	1.00	0.00	-	-	-	-
InnerArraySequentialByValue	CoreRun Struct IL Stubs	25,386.27 ns	503.3039 ns	446.1658 ns	0.94	0.02	-	-	-	-

FixedBufferClassificationTestByValue	CoreRun Master	144.83 ns	2.4056 ns	2.2502 ns	1.00	0.00	-	-	-	-
FixedBufferClassificationTestByValue	CoreRun Struct IL Stubs	77.08 ns	0.7635 ns	0.6768 ns	0.53	0.01	-	-	-	-

The allocation in InnerArraySequentialByValue is a System.RuntimeMethodInfoStub allocated by the JIT_GetRuntimeMethodStub helper for implementing the ldtoken instruction that loads the token for the nested InnerSequential struct IL stub onto the stack in the InnerArraySequential struct IL stub.

In addition to normal CoreCLR testing, I've also run the WinForms integration test suite with this local build of CoreCLR to validate that it doesn't break upstream. I've also run the struct marshalling tests with GCStress modes 3 and C.

stephentoub · 2019-08-24T00:56:55Z

The following are the perf numbers I got on Windows x64

What is the perf test actually doing?

The allocation in InnerArraySequentialByValue is a System.RuntimeMethodInfoStub allocated by the JIT_GetRuntimeMethodStub helper

Why is that an acceptable regression?

jkoritzinsky · 2019-08-24T02:03:21Z

The perf test is calling a PInvoke passing the struct by value. I'll post the benchmark on Monday.

Re the allocation: I'm planning on trying to figure out how to cache the allocated object so we only allocate at most once per stub instead of once per stub call. Additionally, the allocation only happens when marshalling an array of non-blittable structures, which in that case already has allocations, so I felt it was still worth putting this up as a draft to get some full CI runs on it before figuring out how to remove the single allocation.

stephentoub · 2019-08-24T14:25:34Z

so I felt it was still worth putting this up as a draft to get some full CI runs on it before figuring out how to remove the single allocation.

Ok. Thanks.

jkoritzinsky · 2019-08-26T16:57:53Z

Here's the benchmark (using the structs mentioned in the original post. The native library is the native component of the StructMarshaling/PInvoke test.

Benchmark

namespace InteropBenchmarks
{
    [MemoryDiagnoser]
    public class StructMarshalling
    {
        private float f;
        private S8 s8;

        private InnerArraySequential ias;
        private FixedBufferClassificationTest fb;

        [GlobalSetup]
        public void Setup()
        {
            f = MarshalStructAsParam.ProductHFA(default);
            s8 = Helper.NewS8("hello", true, 10, 128, 128, 32);
            ias = Helper.NewInnerArraySequential(1, 1.0F, "some string");
            fb = Helper.NewFixedBuffer(42.0f);
        }

        [Benchmark]
        public bool S8ByValue()
        {
            return MarshalStructAsParam.MarshalStructAsParam_AsSeqByVal11(s8);
        }

        [Benchmark]
        public bool InnerArraySequentialByValue()
        {
            return MarshalStructAsParam.MarshalStructAsParam_AsSeqByVal2(ias);
        }
        [Benchmark]
        public bool FixedBufferClassificationTestByValue()
        {
            return MarshalStructAsParam.MarshalStructAsParam_AsSeqByValFixedBufferClassificationTest(fb, fb.f.F);
        }
    }
}

class MarshalStructAsParam
{

    [DllImport(nameof(MarshalStructAsParam))]
    public static extern float ProductHFA(HFA hfa);
    [DllImport(nameof(MarshalStructAsParam))]
    public static extern bool MarshalStructAsParam_AsSeqByVal11(S8 str);
    [DllImport(nameof(MarshalStructAsParam))]
    public static extern bool MarshalStructAsParam_AsSeqByVal2(InnerArraySequential str1);
    [DllImport(nameof(MarshalStructAsParam))]
    public static extern bool MarshalStructAsParam_AsSeqByValFixedBufferClassificationTest(FixedBufferClassificationTest str, float f);
}

jkoritzinsky · 2019-08-27T21:37:51Z

/azp run coreclr-ci

azure-pipelines · 2019-08-27T21:38:10Z

Pull request contains merge conflicts.

jkoritzinsky · 2019-08-28T23:39:36Z

I've updated the perf benchmarks with the current result. I've removed the allocation from each iteration by caching it in the MethodDesc's LoaderAllocator (which has the same maximum lifetime as the allocated object).

jkoritzinsky · 2019-09-05T02:25:02Z

Perf numbers updated for commit 034db68 where I've changed the ldtoken StructStub to ldftn StructStub and removed the RuntimeMethodInfoStub caching.

src/vm/methodtablebuilder.cpp

jkoritzinsky · 2019-10-07T23:03:37Z

@AaronRobinsonMSFT can you make a review pass when you have a chance?

AaronRobinsonMSFT

I am half way through at dllimport.h.

AaronRobinsonMSFT · 2019-10-09T21:47:58Z

src/System.Private.CoreLib/src/System/StubHelpers.cs

+            int numChars = strManaged.Length;
+            if (numChars >= length)
+            {
+                numChars = length - 1;


Is there a doc reference for the logic where we apply a guaranteed null terminator?

I don't believe we have a doc reference for the auto-null-terminator logic.

src/System.Private.CoreLib/src/System/StubHelpers.cs

src/vm/dllimport.cpp

AaronRobinsonMSFT · 2019-10-22T22:23:04Z

@jkoritzinsky Can you squash these commits and rebase? Some of the tooling I have only supports 64 commits for review :-/

AaronRobinsonMSFT · 2019-10-22T22:07:20Z

src/vm/ilstubresolver.cpp

-    NewArrayHolder<CompileTimeState> pNewCompileTimeState    = NULL;
+    if (!UseLoaderHeap())
+    {
+        NewArrayHolder<BYTE>             pNewILCodeBuffer = NULL;


Can you pull these command variables outside the branch. I don't see why they need to be different.

The reason the branching here is weird is because we want to be able to use the holders for cleanup in the exceptional/failure case. However, when we don't need to use the loader heap, we allocate via "new" and delete in the ClearCompileTimeState method. If we use the loader heap, we want to allocate from the loader heap and use the appropriate holder and only delete when the loader is unloaded (not in ClearCompileTimeState). If we don't keep the data after ClearCompileTimeState, the JIT will assert when trying to JIT a P/Invoke or struct marshal stub that calls this struct marshal stub.

It's a really awkward case where RAII doesn't quite do what we want unless we create a "multi-holder" type from which we could pick which holder from a set we want to use.

Remove dead code for calculating managed field size. Add field IL marshal infra up to MarshalInfo::GenerateFieldIL. Add preliminary changes for MarshalInfo ctor to have the logic handle field marshalling logic. Still need to handle WinRT struct field logic correctly. First shot at handling fields of WinRT structs. Cleanup Clean up entrypoints Fix cleanup marshal emit. Disable specific paths on struct marshal stubs. Implement emitting full struct marshalling stub. Add StructMarshalInteropStub method name. Add NDirect::CreateMarshalILStub Get byvalue StructMarshalling/PInvoke tests passing excluding missing ILMarshalers (ByValArray and ByValTStr). Correctly classify struct marshal stubs as struct marshal stubs instead of PInvoke stubs. Implement UnmanagedType.ByValArray IL marshaler. Implement ILMarshaler equivalent for ansi-char fixed arrays. Fix parameter mismatch in Native->CLR direction for struct marshalling. Implement ByValTStr marshalling. Support unaligned fields in IL stubs. Load CleanupWorkList from param list if in a struct marshalling stub Implement SafeHandle and CriticalHandle field marshalling in IL struct stubs Fix handle field marshalers. Add error reporting in struct field IL marshalers consistent with old FieldMarshaler error reporting. Convert Array-of-nonblittable-struct marshalling to use IL stubs. Convert LayoutClass marshalling to use IL stubs. Fix marshalling of LayoutClass fields in structs. Add non-blittable fixed buffer assert in the struct IL stub path. Implement Marshal APIs via the IL stubs. Fix default char marshaler selection. Move hidden-length-array marshalling over to struct marshalling stubs. Convert struct marshal IL stub users to use helper that will automatically cleanup on failure to marshal. Match MarshalInfo::MarshalInfo behavior to ParseNativeType for fields. Remove old FieldMarshaler-based marshalling. Fix signed/unsigned mismatch. Fix IsFieldScenario on non-COMINTEROP plaforms Fix off-Windows build. Handle automatic partial cleanup of struct marshaling entirely within the struct stub. Remove now-unused ValueClassMarshaler. Move DateMarshaler to managed since it doesn't need to be an FCall. Error out on recursive struct definitions in the IL stub generation as we did in the field marshalling. Remove FieldMarshalers and replace with a significantly simpler structure (NativeFieldDescriptor) that stores only the needed info for determining validity in WinRT, blittability, and calling convention classification. This will save 4/8 bytes for every field in a non-auto-layout structure or class loaded into the runtime. Add explicit test for recursive native layout. Allow marshalling as UnmanagedType.Error on all platforms (on non-Windows the behavior matches UnmanagedType.I4). Collapse common primitive marshalling cases together. Disable WinRT parameter/retval-only marshalers in field scenarios. Revert "Collapse common primitive marshalling cases together." This reverts commit e73b78a. Fix error marshalling off Windows for uint. Disable LPStruct marshalling in structs. Disable copy-constructor marshaler in the field scenario. Match error messages between MarshalInfo::MarshalInfo and ParseNativeType in the field scenario. Refactor managed-sequential check out of ParseNativeType. Remove invalid MARSHAL_TYPE_GENERIC_U8 references. Add override specifier. Change ParseNativeType to use MarshalInfo::MarshalInfo to calculate field marshalling info instead of maintaining two native field classification functions. Clean up native field categories. Remove nsenums.h since it is now unused. Move CheckIfDisqualifiedFromManagedSequential to class.cpp. Encapsulate stub flags for struct stubs. Read the BestFitAttribute once for the type instead of per field. Fix perf regression in by-val arrays of non-blittable structures by caching the MethodDesc* to the struct stub in the "managed marshaler" structure. Now we have a perf improvement! Fix memory leak in sig creation. Keep compile-time information for struct stubs around as long as the owning loader allocator (instead of leaking). Allocate the signature on the same heap as the IL stubs so it shares the same lifetime. Fix build with fragile NGen support enabled so as to not break partners. Add missing native field descriptors. Fix clang build. Only assert if we're emitting IL (so we don't assert during type-load). Determine desciptor for pointer-sized fields based on target not host pointer size. Don't emit IL stubs that call struct stubs into R2R images since there's not a good way for us to emit the token. Fix tracing test failures. Force field marshaling to not stackalloc. Cache Sytem.RuntimeMethodInfoStub instances created in the VM in the MethodDesc's owning LoaderAllocator. Struct marshal stubs don't have an MethodDesc context arg. Copy FieldDesc on NFD assignment. Fix initialization of stubMethodInfoCache lock owner. Fix alignment calculation of decimal fields and decimal array fields in NFDs. Fix Crst leveling. Enable handling decimal-as-currency fields accurately off-Windows. Fix deadlock where two threads are both trying to generate an IL stub for the same P/Invoke and one of the parameters needs a struct stub generated. Fix incorrect check for if we need a struct marshal stub in some of the variant/array cases. We never need to promote a field value to 8 bytes. Fix issue with recursive pointer fields. Shortcut blittable types in stubhelpers. Use LDFTN + PREPARE_NONVIRUTAL_CALLSITE_USING_CODE instead of LDTOKEN + GetInternalToken. Revert "Fix Crst leveling." This reverts commit 1d8e56e. Revert "Fix initialization of stubMethodInfoCache lock owner." This reverts commit a095390. Revert "Cache Sytem.RuntimeMethodInfoStub instances created in the VM in the MethodDesc's owning LoaderAllocator." This reverts commit 7266538. Fix case where struct marshal stub is unused in native-only mashalling paths. PR Feedback. Clean up terenary statement in dispatchinfo.cpp Cleanup ILStubResolver::AllocGeneratedIL a little bit.

jkoritzinsky · 2019-10-23T00:34:20Z

I've rebased this on master. There were a lot of conflicts from a few months ago that I had to re-resolve as part of that, so a mistake may have snuck in that I didn't catch locally.

tests/src/Interop/LayoutClass/LayoutClassTest.cs

tests/src/Interop/LayoutClass/LayoutClassNative.cpp

src/vm/dispatchinfo.cpp

src/vm/dispparammarshaler.cpp

AaronRobinsonMSFT · 2019-10-23T18:44:25Z

src/vm/olevariant.cpp

@@ -1301,7 +1304,7 @@ void OleVariant::MarshalBoolVariantOleToCom(VARIANT *pOleVariant,
 #endif // FEATURE_COMINTEROP

 void OleVariant::MarshalBoolArrayOleToCom(void *oleArray, BASEARRAYREF *pComArray,
-                                          MethodTable *pInterfaceMT)
+                                          MethodTable *pInterfaceMT, PCODE pManagedMarshalerCode)


pManagedMarshalerCode [](start = 75, length = 21)

It seems very few of these functions are using this parameter. What is the reason for them?

These functions are passed up to the calling code through the Marshaler struct in olevariant.h. The design there is to have each field be a function pointer that points to each needed marshalling method. As a result, they all need to have the same signature, even if one of the parameters (such as this one) is unused. It's the same reason that every marshaler in olevariant.cpp has a MethodTable* parameter when only the interface marshaler uses it.

We can't resolve the struct marshalling stub from the MethodTable* in the marshaler since the lookup is slow enough to slow down non-blittable structure array marshalling by at least 600%.

src/vm/mtypes.h

AaronRobinsonMSFT · 2019-10-23T18:49:39Z

MarshalInfo(Module* pModule,

Please create a new issue to reduce this argument list. It is close to impossible to understand at instantiation time what all the bools and constants mean. We have && and can now have high efficiency struct passing that can describe what constants are meant to define.

Refers to: src/vm/mlinfo.h:456 in 59bb717. [](commit_id = 59bb717, deletion_comment = False)

AaronRobinsonMSFT · 2019-10-23T18:51:35Z

BOOL IsWinRTScenario()

Any chance you could keep the function but have it always return FALSE when FEATURE_COMINTEROP isn't defined? Would remove a lot of the various FEATURE_COMINTEROP checks in mlinfo.cpp.

Refers to: src/vm/mlinfo.h:675 in 59bb717. [](commit_id = 59bb717, deletion_comment = False)

src/vm/mlinfo.cpp

src/vm/ilmarshalers.h

AaronRobinsonMSFT

I am generally okay with this, but do have some concerns I would like to talk about off-line before committing.

…lling, and VT_RECORD is not used in hidden-length array marshalling.

jkoritzinsky · 2019-10-23T22:11:35Z

BOOL IsWinRTScenario()
Any chance you could keep the function but have it always return FALSE when FEATURE_COMINTEROP isn't defined? Would remove a lot of the various FEATURE_COMINTEROP checks in mlinfo.cpp.

Refers to: src/vm/mlinfo.h:675 in 59bb717. [](commit_id = 59bb717, deletion_comment = False)

I've tried to do this and it only really allows us to remove the FEATURE_COMINTEROP defines around the error cases in four places. The majority of usages have to stay covered since we don't define the WinRT marshalers when FEATURE_COMINTEROP is turned off. Since most usages still have to be covered with ifdefs, I think we might as well leave it as is.

jkoritzinsky · 2019-10-23T22:18:15Z

MarshalInfo(Module* pModule,
Please create a new issue to reduce this argument list. It is close to impossible to understand at instantiation time what all the bools and constants mean. We have && and can now have high efficiency struct passing that can describe what constants are meant to define.

Refers to: src/vm/mlinfo.h:456 in 59bb717. [](commit_id = 59bb717, deletion_comment = False)

Submitted the issue. Link: https://github.com/dotnet/coreclr/issues/27399

jkoritzinsky · 2019-10-23T22:35:49Z

@jkotas do you have any feedback on this PR?

AaronRobinsonMSFT

LGTM. I want to get this in as early as possible for consumption by as many stakeholders as we can. This can have great benefit to us and teams like WinForms, but is also a concern due to backwards compat. I see the care taken in merging the two methods, so thank you.

jkotas

I have not reviewed this in detail, but it looks good to me. Anything that removes 3000 lines without breaking any tests must be good :-)

jkoritzinsky · 2019-10-24T00:25:58Z

Linux_musl x64 failure is #26057

jkoritzinsky force-pushed the struct-marshalling-ilstubs branch from a811d96 to f3ac22e Compare August 23, 2019 22:41

jkoritzinsky added the area-Interop label Aug 29, 2019

This was referenced Sep 3, 2019

Cache Sytem.RuntimeMethodInfoStub instances created in the VM in the MethodDesc's owning LoaderAllocator. #26492

Closed

Manually marshal parameters for EventPipeInternal.Enable. #26496

Merged

jkoritzinsky marked this pull request as ready for review September 11, 2019 20:47

jkoritzinsky requested a review from AaronRobinsonMSFT September 11, 2019 20:47

jkotas reviewed Sep 12, 2019

View reviewed changes

src/vm/methodtablebuilder.cpp Show resolved Hide resolved

AaronRobinsonMSFT reviewed Oct 9, 2019

View reviewed changes

AaronRobinsonMSFT reviewed Oct 22, 2019

View reviewed changes

jkoritzinsky force-pushed the struct-marshalling-ilstubs branch from d493541 to 59bb717 Compare October 23, 2019 00:32