Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static 'Initialize' factory method on structs with cbSize fields #108

Open
jnm2 opened this issue Feb 17, 2021 · 20 comments
Open

Static 'Initialize' factory method on structs with cbSize fields #108

jnm2 opened this issue Feb 17, 2021 · 20 comments
Labels
metadata gem A feature of the metadata that cswin32 does not yet utilitze

Comments

@jnm2
Copy link
Contributor

jnm2 commented Feb 17, 2021

Instead of:

var cursorInfo = new CURSORINFO { cbSize = (uint)Marshal.SizeOf(typeof(CURSORINFO)) };

What if we could write:

var cursorInfo = CURSORINFO.Initialize();

And Initialize would use either sizeof or a constant calculated at compile time whenever possible rather than calling Marshal.SizeOf?

(Allowing structs to have default constructors is being discussed for future C# versions.)

BITMAPINFOHEADER has biSize rather than cbSize.

@weltkante
Copy link

weltkante commented Feb 25, 2021

Note that Marshal.SizeOf may not return the correct size for libraries using blitting instead of the classic marshaling layer. There are some field types that will cause the classic marshaling layer to jump in even if the type is blittable, most notably bool fields have different size and alignment for marshaling vs. blitting.

Just wanted to throw that in because I suspect this library is going to use blitting where possible?

details on the difference
  • Marshal.SizeOf for classic marshalling size
  • Marshal.OffsetOf for classic marshalling offsets
  • sizeof keyword for blitting size
  • pointer arithmetic for blitting offsets
public struct Sample
{
    public int a;
    public bool b;
    public short c;
}

var sample = new Sample();
var blittingSize = sizeof(Sample);                              // 8
var blittingOffset = (byte*)&sample.c - (byte*)&sample;         // 6
var marshalSize = Marshal.SizeOf<Sample>();                     // 12
var marshalOffset = Marshal.OffsetOf<Sample>(nameof(Sample.c)); // 8

This happened to cause bugs twice in the WinForms interop cleanup (PRINTDLG and MCGRIDINFO). Whether it is an issue for this library depends strongly on how the mappings are generated (i.e. does it use managed bool in structs ever)

@AArnott
Copy link
Member

AArnott commented Feb 25, 2021

That is horrifying. Thanks for enlightening me on this, @weltkante.
Yes, today all our structs are blittable, but that will not always be the case going forward. I wonder what determines whether a blittable struct will be laid out the marshaled way or the blitted way (when blitting isn't required by using a pointer to it in the p/invoke signature.)
@tannergooding Can you please enlighten us here?

@weltkante
Copy link

weltkante commented Feb 25, 2021

As far as I'm aware blitting is used when you use pointers to or Span of structs, classic marshaling is used for ref, out and arrays of structs. In WinForms they switched some p/invoke signatures to explicitly not receive ref arguments but pointers to structs, so it can use blitting and avoid classic marshaling overhead.

@AArnott
Copy link
Member

AArnott commented Feb 25, 2021

You may be right, but it's alarming to think that changing a p/invoke signature from [Out] struct* to out struct would change the layout.

@AArnott AArnott added the discussion needed We need more feedback or responses from ABI experts label Feb 25, 2021
@tannergooding
Copy link
Member

Can you please enlighten us here?

https://docs.microsoft.com/en-us/dotnet/standard/native-interop/best-practices#blittable-types covers what is and is not blittable. In particular:

  • bool is the only value type that is "never" blittable, because it is always normalized to 0 or 1 on return, even if you explicitly annotate it as UnmanagedType.U1.
  • char is "sometimes" blittable and requires you to mark the method, type, or field as CharSet.Unicode
    • For char, the behavior can also differ between targets and scenarios (COM vs P/Invoke or Windows vs Linux) and is one reason why win32metdata (and ClangSharp) explicitly use ushort. Doing this means the interop bindings need to insert a cast from char* to ushort* or char to ushort, but it avoids needing to concern yourself with how char is handled by any given scenario.

Arrays and strings as individual parameters have blittable contents and can be pinned and passed instead (assuming the CharSet matches for char/string). When they exist as a field of a struct, the struct is no longer blittable and requires marshalling.

[Out] and [In] have no impact on blittable parameters or parameters which are only pinned (that is which don't require marshalling). They largely only impact marshalling behavior and will fail for [Out] string as mutating strings is disallowed.
That is, [In] int[] has no impact because the int[] is just pinned and pointer to index 0 is passed. If the native method is not actually [In] and writes to the pointer then the array will observe the mutation. This is also true of string (when CharSet.Unicode for P/Invoke, I don't recall the rules for COM), ref int, out int, and in int. In all cases the contents of the reference are blittable, so the reference is pinned and a pointer to the pinned contents is passed.

While [In] and [Out] do impact cases like struct S { int value; string s; } because the struct is no longer blittable and so the marshaller must create a copy of the struct that is of the correct layout for native.

  • [In] will have the marshaller allocate memory, marshal the contents, call the native method, and then finish
  • [Out] will have the marshaller allocate memory, call the native method, marshal the modifications back to managed, and then finish
  • [In, Out] will do both, that is: the marshaller allocate memory, marshal the contents, call the native method, marshal the modifications back to managed, and then finish

win32metadata should be defining purely blittable structs and P/Invokes with the correct packing information (outside any bugs) and so Unsafe.SizeOf<T> should be the same as sizeof(T) should be the same as Marshal.SizeOf<T>

@tannergooding
Copy link
Member

tannergooding commented Feb 25, 2021

The forced pinning of references (especially ref T, out T, and in T) also means that there is unnecessary overhead for the case where the reference is to a stack value or where you are repeatedly passing the same parameter to native as a ref.

For example, int QueryPerformanceFrequency(long* lpFrequency); (which is BOOL QueryPerformanceFrequency(LARGE_INTEGER *lpFrequency); in native) is most commonly used as something similar to:

long frequency;
var result = QueryPerformanceFrequency(&frequency);

Debug.Assert(result != 0); // QPC can't fail on WinXP or later
return frequency;

While if you instead had a P/Invoke of bool QueryPerformanceFrequency(out long lpFrequency);, then you might write:

var result = QueryPerformanceFrequency(out long frequency);
Debug.Assert(result); // QPC can't fail on WinXP or later
return frequency;

But this will actually generate something similar to (while the blittable signature doesn't require a shim and can have the transition logic inlined in newer versions of .NET):

var result = QueryPerformanceFrequencyShim(out long frequency);
Debug.Assert(result); // QPC can't fail on WinXP or later
return frequency;

bool QueryPerformanceFrequencyShim(out long frequency) // will not be inlined
{
    fixed (long* pFrequency = &frequency)
    {
        return QueryPerformanceFrequency(pFrequency) != 0;
    }
}

While if you manually defined the same "shim" over the blittable P/Invoke, it could be inlined and potentially optimized.

@weltkante
Copy link

bool is the only value type that is "never" blittable, because it is always normalized to 0 or 1 on return, even if you explicitly annotate it as UnmanagedType.U1

Right, sorry, I always forget that blittable terminology no longer matches what it used to mean, bool has been enabled to act as if blittable in these scenarios, even if it isn't named such. That was specifically enabled here dotnet/runtime#13772 (issue) and here dotnet/runtime#1866 (PR).

@tannergooding
Copy link
Member

Right, although that is slightly different. That is allowing pointers to unmanaged types to act as blittable and not incur marshalling (because its just a pointer and therefore the user is assumed to know what they're doing).

In that scenario, the bool is not normalized. Same for say char*.

@AArnott
Copy link
Member

AArnott commented Feb 26, 2021

For char, the behavior can also differ between targets and scenarios (COM vs P/Invoke or Windows vs Linux) and is one reason why win32metdata (and ClangSharp) explicitly use ushort. Doing this means the interop bindings need to insert a cast from char* to ushort* or char to ushort, but it avoids needing to concern yourself with how char is handled by any given scenario.

Ya, but that also means we lost precision in the metadata. We can't tell from ushort[] in the metadata whether we should project that as char[] because it really represents characters or if it really represents an array of numbers. Similar with sbyte[] (which boggles the mind BTW--why would anyone use signed integers to represent ASCII or UTF-8 encoded characters?).
If .NET in particular has issues with non-blittable scenarios for char the metadata shouldn't concern itself with that. It should still use char which is the semantic match for a UTF-16 encoded character, and leave it to the projections to make that be ushort* if that's helpful for blitting, IMO.

But all that's aside from the question I posed originally, which is given a blittable struct that report different sizes from sizeof<struct> vs Marshal.SizeOf<struct>(), which size should I use when the p/invoke method takes that struct as a parameter with no [MarshalAs] attribute on it? Will .NET blit the struct (therefore sizeof will predict its size) or will it marshal the struct, and therefore I should use Marshal.SizeOf<T>?

win32metadata should be defining purely blittable structs and P/Invokes with the correct packing information (outside any bugs) and so Unsafe.SizeOf<T> should be the same as sizeof(T) should be the same as Marshal.SizeOf<T>

That would be great. So in @weltkante's example where they are different, is it because he uses bool for one of the fields, and thus we should always use the BOOL type as described in the metadata?

Regarding your shim and inlining thoughts, I didn't fully follow you. But here is what CsWin32 currently generates for your example method:

internal static unsafe bool QueryPerformanceFrequency(out long lpFrequency)
{
    fixed (long *lpFrequencyLocal = &lpFrequency)
    {
        bool __result = PInvoke.QueryPerformanceFrequency(lpFrequencyLocal);
        return __result;
    }
}

[DllImport("Kernel32", ExactSpelling = true, SetLastError = true)]
internal static extern unsafe bool QueryPerformanceFrequency([Out] long *lpFrequency);

So we're in-between. The return type is bool but we do use a pointer for the parameter, and then we offer an easier overload that exposes out long. Were you saying that our friendly overload would be inlined? Or if not, what exactly were you suggesting to enable inlining?

@tannergooding
Copy link
Member

tannergooding commented Feb 26, 2021

why would anyone use signed integers to represent ASCII or UTF-8 encoded characters?

Because that's how char is treated in most C/C++ compilers by default and because it trivializes the check of "is ascii" to "is positive". Modern .NET opted to use Span<byte> in most of its APIs, but .NET (back to 1.1) has always had the new string(sbyte*) overloads for handling 8-bit character strings.

  • C/C++ do not have a "proper" int8 vs char type, instead they have char which is implementation defined, signed char, and unsigned char. Where MSVC, GCC, and Clang opt to treat char as signed by default.

It should still use char which is the semantic match for a UTF-16 encoded character, and leave it to the projections to make that be ushort* if that's helpful for blitting, IMO.

There are numerous problems here including that the character type used for Windows APIs is a mapping of wchar_t which is not always UTF16. It is 2-bytes on Windows, but 4 bytes on Unix and the backing tool for win32metadata (ClangSharp) needs to support that.

Additionally, the default behavior of char in .NET is incorrect for most scenarios as it is marshaled as ASCII by default, not blitted as UTF16 and so the backing metadata would become even more bloated by marshal attributes where it is only important from the actual marshalling perspective that it is 2-bytes or 1-byte. Because of this, ClangSharp opts to map to exact types that are guaranteed to be 1-to-1 and result in no possible overhead or ambiguities.

It does result in more overhead on the generator, but that is something that it should be able to handle. The SAL annotations and/or docs are what need to tell you if the input is a string, an array, or a pointer to a single value (and if it is in, out, or ref). The generator needing to handle these scenarios is is already true for UTF8 strings and many of the other pointers it encounters and it should be trivial (in my experience) to extend those same checks to string and .NET's char (including the conversion to/from ushort).

But all that's aside from the question I posed originally, which is given a blittable struct that report different sizes from sizeof vs Marshal.SizeOf(), which size should I use when the p/invoke method takes that struct as a parameter with no [MarshalAs] attribute on it?

Marshal.SizeOf is the only correct value with regards to marshaling. It represents the "native" size of the type where sizeof represents the managed size.

When the types are blittable, these two are equal; but just because they are equal does not mean you have blittable data because cases like bool are never blittable.

So in @weltkante's example where they are different, is it because he uses bool for one of the fields, and thus we should always use the BOOL type as described in the metadata?

Yes, bool is a non-blittable as a value type and will result in marshalling even if you explicitly set the marshal size to 1

  • In .NET 5 and newer you can sometimes blit pointers that are or indirectly contain a bool

The return type is bool but we do use a pointer for the parameter, and then we offer an easier overload that exposes out long. Were you saying that our friendly overload would be inlined? Or if not, what exactly were you suggesting to enable inlining?

Returning bool from the P/Invoke means that the runtime has to insert a marshaling stub that handles the GC transition and also normalizes the boolean value. This will likely not get inlined today and you are better off returning int (which is the backing type for the BOOL typedef) and having the generator add a != 0 check (or an alternative appropriate check for the other bool types you can encounter).

@henrygab
Copy link

win32metadata should be defining purely blittable structs and P/Invokes with the correct packing information (outside any bugs) and so Unsafe.SizeOf<T> should be the same as sizeof(T) should be the same as Marshal.SizeOf<T>

@tannergooding -- That's a good, succinct, and (importantly) testable requirement.

Is there currently a unit test / CI validation step that checks that this is true for all win32metadata structs and P/Invokes?

@tannergooding
Copy link
Member

Is there currently a unit test / CI validation step that checks that this is true for all win32metadata structs and P/Invokes?

ClangSharp supports generating such tests. I am not aware if win32metadata is currently using this functionality. I'm also not aware if they have their own equivalent or not.

@henrygab
Copy link

ClangSharp supports generating such tests. I am not aware if win32metadata is currently using this functionality. I'm also not aware if they have their own equivalent or not.

Oops! Sorry, I lost track of which depot I was in. I'll open an issue (suggestion) in the win32metadata depot.

@AArnott
Copy link
Member

AArnott commented Apr 30, 2021

While the win32metadata structs should all be blittable, that isn't always true in CsWin32. The structs we emit are only guaranteed to be blittable if you set "allowMarshaling": false in the json file.

@AArnott AArnott added blocked This issue cannot be fixed without first a change to a dependency and removed discussion needed We need more feedback or responses from ABI experts labels Oct 27, 2022
@mikebattista mikebattista removed the blocked This issue cannot be fixed without first a change to a dependency label Mar 23, 2023
@mikebattista
Copy link
Contributor

The metadata now includes [StructSizeField("cbSize")] on all structs with cbSize fields.

@jnm2
Copy link
Contributor Author

jnm2 commented Mar 23, 2023

Starting in C# 10, parameterless struct constructors may be declared. Now that new SomeStruct() may run arbitrary code anytime you see it in C#, this may be more idiomatic than a static method.

@AArnott AArnott added the metadata gem A feature of the metadata that cswin32 does not yet utilitze label Mar 25, 2023
@AArnott
Copy link
Member

AArnott commented Jul 27, 2023

@mikebattista To adopt this to automated initializing the field, we may want CsWin32 to be able to determine whether a struct is a variable length struct, so that the Initialize method or ctor can take a value for the length of its last field. I'm trying to remember whether something in the metadata identifies the variable length structs. Do you know?

@mikebattista
Copy link
Contributor

Variable length meaning with a flexible array member as the last field?

@AArnott
Copy link
Member

AArnott commented Jul 27, 2023

I suppose that's one manifestation of it. I'm not sure all variable length structs end with such a field. I vaguely recall a case where the docs simply indicate that the struct acts as a header to a larger buffer.

@mikebattista
Copy link
Contributor

mikebattista commented Jul 27, 2023

See https://github.com/microsoft/win32metadata/blob/main/docs/projections.md#:~:text=Flexible%20array%20members%20at%20the%20end%20of%20a%20struct%20are%20decorated%20with%20%5BFlexibleArray%5D%20(%23912) for the flexible array member scenario.

I'm not aware of anything else in the metadata. If it can't be programmatically scraped, then it would need to be manually attributed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadata gem A feature of the metadata that cswin32 does not yet utilitze
Projects
None yet
Development

No branches or pull requests

6 participants