Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLD][COFF] Do as many passes of resolveRemainingUndefines as necessary for undefined lazy symbols #109082

Merged
merged 1 commit into from
Oct 3, 2024

Conversation

glandium
Copy link
Contributor

@glandium glandium commented Sep 18, 2024

No description provided.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 18, 2024

@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld

Author: Mike Hommey (glandium)

Changes

Cc: @mstorsjo

This addresses #107365 except for the strdup part.


Full diff: https://github.com/llvm/llvm-project/pull/109082.diff

3 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+4-1)
  • (modified) lld/COFF/SymbolTable.cpp (+8-1)
  • (modified) lld/COFF/SymbolTable.h (+4-1)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 1b94f10acf80e5..38a507cb9752ff 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2601,7 +2601,10 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
     createECExportThunks();
 
   // Resolve remaining undefined symbols and warn about imported locals.
-  ctx.symtab.resolveRemainingUndefines();
+  if (ctx.symtab.resolveRemainingUndefines()) {
+    run();
+    ctx.symtab.resolveRemainingUndefines();
+  }
   if (errorCount())
     return;
 
diff --git a/lld/COFF/SymbolTable.cpp b/lld/COFF/SymbolTable.cpp
index efea16ccbbfea0..8d69a3966e638f 100644
--- a/lld/COFF/SymbolTable.cpp
+++ b/lld/COFF/SymbolTable.cpp
@@ -479,10 +479,11 @@ void SymbolTable::reportUnresolvable() {
                        /* localImports */ nullptr, true);
 }
 
-void SymbolTable::resolveRemainingUndefines() {
+bool SymbolTable::resolveRemainingUndefines() {
   llvm::TimeTraceScope timeScope("Resolve remaining undefined symbols");
   SmallPtrSet<Symbol *, 8> undefs;
   DenseMap<Symbol *, Symbol *> localImports;
+  bool foundLazy = false;
 
   for (auto &i : symMap) {
     Symbol *sym = i.second;
@@ -502,6 +503,11 @@ void SymbolTable::resolveRemainingUndefines() {
     // This odd rule is for compatibility with MSVC linker.
     if (name.starts_with("__imp_")) {
       Symbol *imp = find(name.substr(strlen("__imp_")));
+      if (imp && imp->isLazy()) {
+        forceLazy(imp);
+        foundLazy = true;
+        continue;
+      }
       if (imp && isa<Defined>(imp)) {
         auto *d = cast<Defined>(imp);
         replaceSymbol<DefinedLocalImport>(sym, ctx, name, d);
@@ -529,6 +535,7 @@ void SymbolTable::resolveRemainingUndefines() {
   reportProblemSymbols(
       ctx, undefs,
       ctx.config.warnLocallyDefinedImported ? &localImports : nullptr, false);
+  return foundLazy;
 }
 
 std::pair<Symbol *, bool> SymbolTable::insert(StringRef name) {
diff --git a/lld/COFF/SymbolTable.h b/lld/COFF/SymbolTable.h
index bf97cf442039e0..1ab35a87cb9fad 100644
--- a/lld/COFF/SymbolTable.h
+++ b/lld/COFF/SymbolTable.h
@@ -57,7 +57,10 @@ class SymbolTable {
   // Try to resolve any undefined symbols and update the symbol table
   // accordingly, then print an error message for any remaining undefined
   // symbols and warn about imported local symbols.
-  void resolveRemainingUndefines();
+  // Returns whether more files might need to be linked in to resolve lazy
+  // symbols, in which case the caller is expected to call the function again
+  // after linking those files.
+  bool resolveRemainingUndefines();
 
   // Load lazy objects that are needed for MinGW automatic import and for
   // doing stdcall fixups.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 18, 2024

@llvm/pr-subscribers-lld-coff

Author: Mike Hommey (glandium)

Changes

Cc: @mstorsjo

This addresses #107365 except for the strdup part.


Full diff: https://github.com/llvm/llvm-project/pull/109082.diff

3 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+4-1)
  • (modified) lld/COFF/SymbolTable.cpp (+8-1)
  • (modified) lld/COFF/SymbolTable.h (+4-1)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 1b94f10acf80e5..38a507cb9752ff 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2601,7 +2601,10 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
     createECExportThunks();
 
   // Resolve remaining undefined symbols and warn about imported locals.
-  ctx.symtab.resolveRemainingUndefines();
+  if (ctx.symtab.resolveRemainingUndefines()) {
+    run();
+    ctx.symtab.resolveRemainingUndefines();
+  }
   if (errorCount())
     return;
 
diff --git a/lld/COFF/SymbolTable.cpp b/lld/COFF/SymbolTable.cpp
index efea16ccbbfea0..8d69a3966e638f 100644
--- a/lld/COFF/SymbolTable.cpp
+++ b/lld/COFF/SymbolTable.cpp
@@ -479,10 +479,11 @@ void SymbolTable::reportUnresolvable() {
                        /* localImports */ nullptr, true);
 }
 
-void SymbolTable::resolveRemainingUndefines() {
+bool SymbolTable::resolveRemainingUndefines() {
   llvm::TimeTraceScope timeScope("Resolve remaining undefined symbols");
   SmallPtrSet<Symbol *, 8> undefs;
   DenseMap<Symbol *, Symbol *> localImports;
+  bool foundLazy = false;
 
   for (auto &i : symMap) {
     Symbol *sym = i.second;
@@ -502,6 +503,11 @@ void SymbolTable::resolveRemainingUndefines() {
     // This odd rule is for compatibility with MSVC linker.
     if (name.starts_with("__imp_")) {
       Symbol *imp = find(name.substr(strlen("__imp_")));
+      if (imp && imp->isLazy()) {
+        forceLazy(imp);
+        foundLazy = true;
+        continue;
+      }
       if (imp && isa<Defined>(imp)) {
         auto *d = cast<Defined>(imp);
         replaceSymbol<DefinedLocalImport>(sym, ctx, name, d);
@@ -529,6 +535,7 @@ void SymbolTable::resolveRemainingUndefines() {
   reportProblemSymbols(
       ctx, undefs,
       ctx.config.warnLocallyDefinedImported ? &localImports : nullptr, false);
+  return foundLazy;
 }
 
 std::pair<Symbol *, bool> SymbolTable::insert(StringRef name) {
diff --git a/lld/COFF/SymbolTable.h b/lld/COFF/SymbolTable.h
index bf97cf442039e0..1ab35a87cb9fad 100644
--- a/lld/COFF/SymbolTable.h
+++ b/lld/COFF/SymbolTable.h
@@ -57,7 +57,10 @@ class SymbolTable {
   // Try to resolve any undefined symbols and update the symbol table
   // accordingly, then print an error message for any remaining undefined
   // symbols and warn about imported local symbols.
-  void resolveRemainingUndefines();
+  // Returns whether more files might need to be linked in to resolve lazy
+  // symbols, in which case the caller is expected to call the function again
+  // after linking those files.
+  bool resolveRemainingUndefines();
 
   // Load lazy objects that are needed for MinGW automatic import and for
   // doing stdcall fixups.

@glandium glandium marked this pull request as draft September 18, 2024 06:41
Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, thanks! But this would need a testcase.

Have a look in lld/test/COFF, some of the more modern testcases using split-file probably serves as the best example. I guess this can be testable with three .s files:
One that contains an undefined reference to __imp_func. Two .s files that are assembled and stored in a static library. One containing a definition of func, and with an undefined reference to other, and the second static library object containing a definition of other.

Then make sure that the testcase does fail on current git main, and succeeds after this change. The third object file containing other might simulate the need for calling run() after forceLazy() - ideally the test should still fail if the run() is removed (but it can also be possible that the test still passes despite that - but we probably still need the run() just to be safe).

lld/COFF/Driver.cpp Outdated Show resolved Hide resolved
@@ -502,6 +503,11 @@ void SymbolTable::resolveRemainingUndefines() {
// This odd rule is for compatibility with MSVC linker.
if (name.starts_with("__imp_")) {
Symbol *imp = find(name.substr(strlen("__imp_")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, I find the variable name imp here kinda misleading, as it's the imp-less symbol! (We can consider changing it in a separate patch later.)

@glandium
Copy link
Contributor Author

One that contains an undefined reference to __imp_func. Two .s files that are assembled and stored in a static library. One containing a definition of func, and with an undefined reference to other, and the second static library object containing a definition of other.

I think there needs to be a DLL involved for the lazy linking?

@mstorsjo
Copy link
Member

One that contains an undefined reference to __imp_func. Two .s files that are assembled and stored in a static library. One containing a definition of func, and with an undefined reference to other, and the second static library object containing a definition of other.

I think there needs to be a DLL involved for the lazy linking?

Hm, I don't think so? Isn't the lazy symbols just for static libraries, where we've seen symbols and know we can load them if needed, but don't do it unless triggered.

I may very well be wrong though - I guess you'll see what's needed to actually trigger the relevant cases.

@glandium
Copy link
Contributor Author

Ah, I was misremembering and mixing things up with the other lld-link issue I filed with delayload.

@glandium glandium force-pushed the lld-coff-resolve-remaining-undefines branch from c44939f to 23061e5 Compare September 18, 2024 11:45
@glandium
Copy link
Contributor Author

This new version has the while loop and is rebased on top of current main. I haven't included a test yet.

@glandium glandium force-pushed the lld-coff-resolve-remaining-undefines branch from 23061e5 to 16899d3 Compare September 18, 2024 12:24
@glandium glandium marked this pull request as ready for review September 18, 2024 12:25
@glandium
Copy link
Contributor Author

I double checked the test fails without the patch and passes with.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks! Two minor nits, one inline, and the other about the PR description:

As LLVM uses the "merge and squash" method, the PR subject + description ends up as the commit message (although whoever merges it has a chance to touch it up), so it's good to edit it into the form that is wanted as the final commit message.

In this case, CCs and similar unfortunately is best to do in a separate comment after. It's good to include the issue reference though, but at this point it actually does fix the issue completely AFAIK, so that can be tweaked too.

ctx.symtab.resolveRemainingUndefines();
while (ctx.symtab.resolveRemainingUndefines()) {
run();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we stylistically tend to omit the braces for these kinds of single-line statements, see https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements.

@glandium glandium changed the title [LLD][COFF] Do another pass of resolveRemainingUndefines for undefined lazy symbols [LLD][COFF] Do as many passes of resolveRemainingUndefines as necessary for undefined lazy symbols Sep 18, 2024
@glandium glandium force-pushed the lld-coff-resolve-remaining-undefines branch from 16899d3 to f6a6060 Compare September 19, 2024 00:02
@glandium glandium marked this pull request as draft September 19, 2024 07:37
@glandium
Copy link
Contributor Author

Unfortunately, it seems not to be enough. Further down the line, it seems the import symbols aren't getting a proper relocation.

@glandium
Copy link
Contributor Author

(And the cherry on top: llvm-readobj crashes on the problematic binary)

@mstorsjo
Copy link
Member

Unfortunately, it seems not to be enough. Further down the line, it seems the import symbols aren't getting a proper relocation.

Hmm, that's tricky :-( I think this patch in itself looks correct in any case, so I think that whatever incorrect relocation you have is related to something else - but I don't mind hold off and waiting for that to be figured out before proceeding with this one.

@glandium
Copy link
Contributor Author

This is the repro that ends up broken because of missing relocations:
https://drive.google.com/file/d/1qh0dWMsg4xftyijorkEMudx4rsi-LVXI/view?usp=sharing
without the patch, it fails with:

lld-link: warning: libwhatsys-6987fe4a7a53db19.rlib(db3b6bfb95261072-windows.o): locally defined symbol imported: strncpy_s (defined in libucrt.lib(strncpy_s.obj)) [LNK4217]
lld-link: warning: libwhatsys-6987fe4a7a53db19.rlib(db3b6bfb95261072-windows.o): locally defined symbol imported: __stdio_common_vsprintf (defined in libucrt.lib(output.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o): locally defined symbol imported: __stdio_common_vsprintf (defined in libucrt.lib(output.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o): locally defined symbol imported: strerror (defined in libucrt.lib(strerror.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o): locally defined symbol imported: calloc (defined in libucrt.lib(calloc.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o): locally defined symbol imported: malloc (defined in libucrt.lib(malloc.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o): locally defined symbol imported: free (defined in libucrt.lib(free.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o): locally defined symbol imported: strncmp (defined in libucrt.lib(strncmp.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-midl.o): locally defined symbol imported: malloc (defined in libucrt.lib(malloc.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-midl.o): locally defined symbol imported: free (defined in libucrt.lib(free.obj)) [LNK4217]
lld-link: warning: liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-midl.o): locally defined symbol imported: realloc (defined in libucrt.lib(realloc.obj)) [LNK4217]
lld-link: warning: oldnames.lib(strdup.obi): locally defined symbol imported: _strdup (defined in libucrt.lib(strdup.obj)) [LNK4217]
lld-link: error: undefined symbol: __declspec(dllimport) wcscpy
>>> referenced by /tmp/gecko/third_party/rust/lmdb-rkv-sys/lmdb/libraries/liblmdb/mdb.c:4238
>>>               liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o):(mdb_fopen)

lld-link: error: undefined symbol: __declspec(dllimport) _aligned_malloc
>>> referenced by /tmp/gecko/third_party/rust/lmdb-rkv-sys/lmdb/libraries/liblmdb/mdb.c:9324
>>>               liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o):(mdb_env_copyfd1)

lld-link: error: undefined symbol: __declspec(dllimport) _aligned_free
>>> referenced by /tmp/gecko/third_party/rust/lmdb-rkv-sys/lmdb/libraries/liblmdb/mdb.c:9425
>>>               liblmdb_sys-f56ac55de745eb3d.rlib(61d5b3baa82548eb-mdb.o):(mdb_env_copyfd1)

With the patch, executing the binary crashes (to reduce noise from the number of processes spawned, you may run crashreporter-6497c44cdd21c683.exe test::glean_ping_) because __stdio_common_vsprintf doesn't point to the right place (but so do _imp_wcscpy, _imp_aligned_malloc and _imp_aligned_free, and probably the others, but I don't know about that for sure)

@glandium
Copy link
Contributor Author

So indeed, it's probably some preexisting problem that is now exposed because the link goes through now.

@glandium
Copy link
Contributor Author

Boy, this link is actually a mess. link.exe complains about wcscpy, aligned_malloc and aligned_free being missing, and it looks like it's right to do so, because they are in ucrt.lib, but it's not on the command line. But adding it conflicts with libucrt.lib that is pulled in via a /defaultlib directive in libcmt.lib.

@glandium
Copy link
Contributor Author

glandium commented Oct 2, 2024

Let's land this out of caution after all. @mstorsjo can you do that for me?

@mstorsjo
Copy link
Member

mstorsjo commented Oct 3, 2024

Let's land this out of caution after all. @mstorsjo can you do that for me?

Sure; while the case is a bit contrieved, it's fully possible to hit, so this should be a valid fix for that.

@mstorsjo mstorsjo merged commit 6a1bdd9 into llvm:main Oct 3, 2024
10 checks passed
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants