-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Add GetLoadLibrary function for PAL & use in TrackErrorCode #15831
Conversation
@wtgodbe I would synthesize scenarios in which load fails -- presumably the simplest one is a pinvoke to a nonexistent library, or to a library that's in an invalid format. Another variation (which I think inspired the original issue) was a missing indirect dependency. Find something we pinvoke to that has some indirect dependency, and temporarily remove that indirect dependency? You should get an exception with whatever "dlerror" gave. If not, put a debugger on it |
@dotnet-bot help please |
Welcome to the dotnet/coreclr Repository The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following jobs are launched by default for each PR against dotnet/coreclr:master. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
@RussKeldorph is it still possible to run corefx tests in the coreclr CI? It is super useful from time to time but I don't see it in the help output above. |
Welcome to the dotnet/coreclr Repository The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following jobs are launched by default for each PR against dotnet/coreclr:master. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
Welcome to the dotnet/coreclr Perf help The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
@danmosemsft Looks like @dotnet-bot followed up? |
test Windows_NT x64 Checked corefx_baseline |
Tests are failing (on multiple OS's) with
Which is confusing to me. Trying to get it repro locally, but the builds I ran locally before opening this PR all succeeded. |
@RussKeldorph yea, it posted a 2nd help. Odd, but that has it. Thanks. |
MSBuild is crashing with |
The number of failing builds has somehow decreased from 4 to 2 without requeuing anything, so something funny is going on. But I suspect that the remaining 2 failing builds with rectify themselves too. |
src/pal/src/loader/module.cpp
Outdated
IN DWORD nSize) | ||
{ | ||
DWORD retval = 0; | ||
INT error_length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, this and last_error can be declared on the same lines they are assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then presumably it can be LPCWSTR
src/pal/src/loader/module.cpp
Outdated
wcscpy_s(lpBuffer, nSize, last_error); | ||
|
||
done: | ||
LOGEXIT("PAL_GetLoadLibraryError returns %d\n", retval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the common Win32 practice is when the buffer is big enough, but for eg GetEnvironmentVariable it would return the length without null (error_length)
src/pal/src/loader/module.cpp
Outdated
|
||
if (error_length >= (INT)nSize) | ||
{ | ||
TRACE("Buffer too small (%u) to copy last error message (%u).\n", nSize, error_length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is going to clear dlerror() so when you call again with the correct buffer size it gives nothing. We need a different scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas I assume the PAL cannot return memory it allocated itself? Is there a better scheme than just passing in a buffer that's hopefully large enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PAL can just return the pointer from dlerror. Without any copies, etc.
It may be nice to call the method PAL_dlerror
to make it clear that it is just dlerror wrapper.
src/vm/dllimport.cpp
Outdated
{ | ||
LIMITED_METHOD_CONTRACT; | ||
|
||
DWORD priority; | ||
|
||
#ifdef FEATURE_PAL | ||
LPWSTR lpLastError = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You aren't allocating a buffer to put the result in. You need some type of smart string so that when LoadLibErrorTracker goes out of scope, it is deallocated.
I think InlineSString is probably what you need. Then I think you do something like OpenUTF8Buffer to get the buffer to pass in. But I do not know the VM idioms. @jkotas what is appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to convert the message from UTF8 used by the PAL to UTF16. SString is a good way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there documentation anywhere on SString? I'm trying to figure out the best way to use it in this instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header file has comments about the different APIs: https://github.com/dotnet/coreclr/blob/master/src/inc/sstring.h
I think you want to use it like this: SString errorMessage(SString::Utf8, ...utf8String...);
src/vm/dllimport.cpp
Outdated
PAL_GetLoadLibraryError(lpLastError, nSize); | ||
UpdateHRUnix(lpLastError); | ||
#else | ||
DWORD dwLastError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can declare and define on the same line
You should create a small test app that tried to PInvoke into a non-existing .so, corrupted .so or .so without right permissions and see that we are getting a better error message for it with your change. Please share the output before/after. |
src/pal/src/loader/module.cpp
Outdated
Wrapper for dlerror() to be used by PAL functions | ||
|
||
Parameters: | ||
IN nSize - the number of characters to copy to lpBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is no longer accurate
src/pal/src/loader/module.cpp
Outdated
PERF_ENTRY(PAL_GetLoadLibraryError); | ||
ENTRY("PAL_GetLoadLibraryError"); | ||
|
||
LPCWSTR last_error = (LPCWSTR) dlerror(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method should return LPCSTR
and this cast should be deleted.
src/vm/dllimport.cpp
Outdated
@@ -5791,8 +5809,14 @@ class LoadLibErrorTracker | |||
} | |||
} | |||
|
|||
void UpdateHRUnix(LPCWSTR message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is updating message, not HR. You can call this SetMessage; and keep the name of UpdateHR as is.
src/vm/dllimport.cpp
Outdated
} | ||
|
||
HRESULT GetHR() | ||
{ | ||
return m_hr; | ||
} | ||
|
||
SString GetMessage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return SString&
.
Returning it as SString
is quite inefficient - it creates copies of the whole string.
src/vm/dllimport.cpp
Outdated
void DECLSPEC_NORETURN Throw(SString &libraryNameOrPath) | ||
{ | ||
STANDARD_VM_CONTRACT; | ||
|
||
#ifdef FEATURE_PAL | ||
SString hrString = (SString) GetMessage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to store it in a local. You can just pass it into COMPlusThrow directly.
I've pushed a change to modify the error messages. I'm still thinking about a scheme for determining error message priority for Unix though - the errors we get from dlerror() are human readable strings rather than numerical error codes, so I'm not sure how to determine which to give priority to. Would it be best to look at the output after #15912 to see what things look like after removing duplicate messages, or to revert the change to append all error messages to the final error string? I'm partial to the latter strategy, as the error for a bad .so already looks good in either case, and the error for a missing dependency would look like #15831 (comment) but with the new strings from my latest commit. |
Tizen timeout is happening in most (if not all) PRs. @jkotas @danmosemsft any thoughts on my last comment? I'm leaning towards reverting the change that appends all error messages, so the final error message would look like:
|
Nit: You know where you are running so you can just print right name of the environment variable. |
Is https://github.com/dotnet/coreclr/pull/15831/files#diff-a752d870496de85a0304a6f77fcc9c91R991 the right way to solve that problem @jkotas? If so I'll add a third version of the message for Mac. |
Also what is the macro for if we're running on mac vs linux? I only know of FEATURE_PAL for Windows vs Unix |
Agree. I still think it would be nice to try the pick the most relevant error message somehow. Otherwise, it is pretty much always going to be |
|
@jkotas I did confirm that we get the correct error for a bad .so (missing dependency), regardless of the location (Core_Root or in the executable's dir, with no fallback):
I'm trying to think of a good way to pick a most relevant message in the general case, but given that dlerror() gives us no error code & the strings could change, I'm not sure what that would be. |
This error means that the .so was loaded successfully, but we have later failed to find the entrypoint in it. It is not the right "bad so" to exercise the error message path you are modifying. |
You can try with a text file renamed to .so. It should give you bad .so that fails to load. |
I see currently an error that reads
in the list of errors, but in a different position depending on where the bogus .so is, so I suspect we may get the wrong exception depending on the location of the .so. I'll rebuild after my new changes and confirm |
We could put any extra messages (whatever you don't want in Message) into some field on the exception so it's at least visible in the debugger, and perhaps ToString(). Seems that we can defer this until we have some real world experience. This change gives us the infrastructure to experiment. |
src/dlls/mscorrc/mscorrc.rc
Outdated
IDS_EE_NDIRECT_LOADLIB_WIN "Unable to load DLL '%1' or one of its dependencies: %2" | ||
IDS_EE_NDIRECT_LOADLIB_LINUX "Unable to load shared library '%1' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: %2" | ||
IDS_EE_NDIRECT_LOADLIB_MAC "Unable to load shared library '%1' or one of its dependencies. In order to help diagnose loading problems, consider setting the DYLD_PRINT_LIBRARIES environment variable: %2" | ||
IDS_EE_NDIRECT_GETPROCADDRESS "Unable to find an entry point named '%2' in shared library '%1'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need a windows form of this message too -- that says DLL?
src/vm/dllimport.cpp
Outdated
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, not worth resetting for, but normally after #else / #elif / #endif you add an inline comment noting what #if or #endif they match with. Eg., line 5785 would end with // APPLE and 5797 with // FEATURE_PAL -- if I understand the convention correctly
I'm now getting the following on Ubuntu, with a missing .so:
0x263E corresponds to IDS_EE_NDIRECT_LOADLIB_LINUX (https://github.com/dotnet/coreclr/pull/15831/files#diff-09bf819ffe16f3ff81053bbb1dbb2eecR897), which is the error message I'd expect to see here, and which is defined at https://github.com/dotnet/coreclr/pull/15831/files#diff-a752d870496de85a0304a6f77fcc9c91R991. I wasn't having this problem before the commit to add an error for Mac at ef56ee9. Trying to figure out what's going on here, but no luck so far. |
On your local box? Likely your resource add did not get built. Those are in mscorrc.*dll. Try deleting coreclr\bin\Product\Windows_NT.x64.Debug and then building coreclr\src incrementally? |
You can also sanity check by opening the mscorrc.debug.dll that's next to your coreclr.dll etc in Visual Studio (File > Open >File) and open the string table and you will see whether the string you added is there. |
Yep, everything looks as expected now. Should I go ahead and merge? |
|
…15831) * Add GetLoadLibrary function for PAL & use in TrackErrorCode * Finish changes to dllimport.cpp * Fix unix build errors * fix windows issues * Address feedback * Fix return type * Address feedback * Fix return type * Fix usage of LPCWSTR * Use shared library string * Append message strings * Fix append * Append newline * Fix spelling * Modify error messages * Resolve conflicts * Add mac message & stop appending all exceptions * Fix another error message * Fix NoName Error * Add newline to .h file
For https://github.com/dotnet/coreclr/issues/10520
@danmosemsft PTAL - what's the best way for me to test this locally? Builds are succeeding on Windows & Unix
CC @karelz