Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ThinLTO] Missing vtable COMDAT symbol after r350948 #39760

Closed
llvmbot opened this issue Jan 22, 2019 · 19 comments
Closed

[ThinLTO] Missing vtable COMDAT symbol after r350948 #39760

llvmbot opened this issue Jan 22, 2019 · 19 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 22, 2019

Bugzilla Link 40414
Resolution FIXED
Resolved on Feb 01, 2019 03:01
Version 7.0
OS Windows NT
Blocks #39678
Reporter LLVM Bugzilla Contributor
CC @froydnj,@zmodem,@jdm,@pcc

Extended Description

After r350948, a Win64 Firefox ThinLTO build fails with:

LLVM ERROR: Associative COMDAT symbol '??_7KeywordsSink@?A0x23271001@@6b@' does not exist

(aka const anonymous namespace'::KeywordsSink::vftable')

Notably it started failing at r350948 and not r340949. I would have thought the first patch of that pair was pretty innocuous.

Our flags do not include any of: -f[no]split-lto-unit, -fsanitize=cfi, or -fwhole-program-vtables.

In case it's a clue, the KeywordsSink class is only used within a single translation unit: https://searchfox.org/mozilla-central/search?q=KeywordsSink

I'm requesting this to be an 8.0.0 blocker as this build config worked for us in 7.0.1.

I can provide a linkrepro upon request though I'm not sure if that would be sufficient to investigate this. Let me know if I can help with the debug somehow. Thanks!

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 22, 2019

Do you have a reproducer you can share? Any chance it is using a mix of old and new bitcode?

Note that the title of this bug and the description are referencing the wrong revisions - they should be r35094[89] and not r34094[89]. Let me adjust the title...

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 22, 2019

Do you have a reproducer you can share? Any chance it is using a mix of old
and new bitcode?

Are you referring to a linkrepro? Or would you need to start from scratch with cpp files? (Not sure whether this happens in the compiler or linker)

I used clobber builds for my bisection so there shouldn't be any mix of bitcodes.

Note that the title of this bug and the description are referencing the
wrong revisions - they should be r35094[89] and not r34094[89]. Let me
adjust the title...

You're right: r35094[89]. Though the title still has a typo :) Adjusting again.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 22, 2019

Do you have a reproducer you can share? Any chance it is using a mix of old
and new bitcode?

Are you referring to a linkrepro? Or would you need to start from scratch
with cpp files? (Not sure whether this happens in the compiler or linker)

Just the link should be sufficient.

I used clobber builds for my bisection so there shouldn't be any mix of
bitcodes.

Note that the title of this bug and the description are referencing the
wrong revisions - they should be r35094[89] and not r34094[89]. Let me
adjust the title...

You're right: r35094[89]. Though the title still has a typo :) Adjusting
again.

Ugh, thanks.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 22, 2019

Just a guess, looking at the source code. The vtable will be local since it is defined in an anonymous namespace. If ThinLTO ends up exporting a reference, it will need to promote/rename this symbol. And COFF apparently requires that each COMDAT retains a symbol of the same name.

With splitting enabled, we would hit the code in the ThinLTOBitcodeWriter that handles this for COFF when promotion/renaming is required by the module splitting (see https://reviews.llvm.org/D31963 which fixed it there). My guess is that we need to do something similar even when we aren't splitting.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 22, 2019

Sent a Google Drive link to a linkrepro archive.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 23, 2019

Thanks, reproduced it. Will try to root cause it today.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 23, 2019

In the meantime, can you give your build a try with -fsplit-lto-unit and confirm that it succeeds? That should restore the previous behavior, and you can use it as a workaround for now.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 23, 2019

Just a guess, looking at the source code. The vtable will be local since it
is defined in an anonymous namespace. If ThinLTO ends up exporting a
reference, it will need to promote/rename this symbol. And COFF apparently
requires that each COMDAT retains a symbol of the same name.

With splitting enabled, we would hit the code in the ThinLTOBitcodeWriter
that handles this for COFF when promotion/renaming is required by the module
splitting (see https://reviews.llvm.org/D31963 which fixed it there). My
guess is that we need to do something similar even when we aren't splitting.

This theory is borne out by the save temp files in the LTO backend. Going into the backend for ucol_res.obj:

$"??_7KeywordsSink@?A0x23271001@@6b@" = comdat any

@​anon.1f1d01ebea411e7039392aebde7ed70e.5 = private unnamed_addr constant { [4 x i8*] } { [4 x i8*] [i8* bitcast (%rtti.CompleteObjectLocator* @"??_R4KeywordsSink@?A0x23271001@@6b@" to i8*), i8* bitcast (i8* (%"struct.(anonymous namespace)::KeywordsSink", i32) @"??_GKeywordsSink@?A0x23271001@@UEAAPEAXI@Z" to i8*), i8* bitcast (i8* (%"class.icu_63::UObject") @"?getDynamicClassID@UObject@icu_63@@UEBAPEAXXZ" to i8*), i8* bitcast (void (%"struct.(anonymous namespace)::KeywordsSink", i8, %"class.icu_63::ResourceValue", i8, i32)* @"?put@KeywordsSink@?A0x23271001@@UEAAXPEBDAEAVResourceValue@icu_63@@CAEAW4UErrorCode@@@z" to i8*)] }, comdat($"??_7KeywordsSink@?A0x23271001@@6b@"), !type !​4043, !type !​4048, !type !​4049

@"??_7KeywordsSink@?A0x23271001@@6b@" = internal unnamed_addr alias i8*, getelementptr inbounds ({ [4 x i8*] }, { [4 x i8*] }* @​anon.1f1d01ebea411e7039392aebde7ed70e.5, i32 0, i32 0, i32 1)

And after promotion (because a reference to it is exported), the latter becomes:

@"??_7KeywordsSink@?A0x23271001@@6b@.llvm.11791635078693174447" = hidden unnamed_addr alias i8*, getelementptr inbounds ({ [4 x i8*] }, { [4 x i8*] }* @​anon.1f1d01ebea411e7039392aebde7ed70e.5, i32 0, i32 0, i32 1)

So now there is no value that matches the COMDAT name, resulting in an error for COFF.

What we need to do to fix should be pretty straightforward - in the LTO backend when we promote/rename, if it is the name of the COMDAT we also rename the COMDAT accordingly. This would be similar to what D31963 was doing when promoting during LTO Unit splitting.

This was an existing bug for COFF with ThinLTO. Interestingly, it would have been hit before except that note the above value has type metadata. This resulted in the splitting kicking in. I'm not sure the full set of conditions where we get type metadata, the places I was aware of in the past were when compiling with either -fsanitize or -fwhole-program-vtables, which comment #​1 says is not the case here.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 23, 2019

By the way, I'm still testing our matrix of supported build configs, but -fsplit-lto-unit got me a successful build at least in the first configuration that I tried. Thanks for the tip.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 25, 2019

Just as another data point: by coincidence a colleague was attempting to enable -fsanitize=cfi and ran into a similar error message even with 7.0.1. That would seem to support your diagnosis.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 26, 2019

Just as another data point: by coincidence a colleague was attempting to
enable -fsanitize=cfi and ran into a similar error message even with 7.0.1.
That would seem to support your diagnosis.

Actually with CFI enabled I would expect it to be more likely to take the path that hides the bug, so I'm not sure what this means.

But in any case, I have a fix for the original reproducer. It is pretty simple but I didn't have a chance to tackle this until late yesterday. Including it below so you can try it out when you have a chance. I just need to add some comments and a test case before I send for review early next week.

diff --git a/include/llvm/Transforms/Utils/FunctionImportUtils.h b/include/llvm/Transforms/Utils/FunctionImportUtils.h
index ac69908d962..853e69219e6 100644
--- a/include/llvm/Transforms/Utils/FunctionImportUtils.h
+++ b/include/llvm/Transforms/Utils/FunctionImportUtils.h
@@ -43,6 +43,8 @@ class FunctionImportGlobalProcessing {
/// to promote any non-renamable values.
SmallPtrSet<GlobalValue *, 8> Used;

  • DenseMap<const Comdat *, Comdat *> RenamedComdats;
  • /// Check if we should promote the given local value to global scope.
    bool shouldPromoteLocalToGlobal(const GlobalValue *SGV);

diff --git a/lib/Transforms/Utils/FunctionImportUtils.cpp b/lib/Transforms/Utils/FunctionImportUtils.cpp
index a122b28e52e..b4ac650cbe3 100644
--- a/lib/Transforms/Utils/FunctionImportUtils.cpp
+++ b/lib/Transforms/Utils/FunctionImportUtils.cpp
@@ -248,6 +248,7 @@ void FunctionImportGlobalProcessing::processGlobalForThinLTO(GlobalValue &GV) {
bool DoPromote = false;
if (GV.hasLocalLinkage() &&
((DoPromote = shouldPromoteLocalToGlobal(&GV)) || isPerformingImport())) {

  • auto Name = GV.getName();
    // Once we change the name or linkage it is difficult to determine
    // again whether we should promote since shouldPromoteLocalToGlobal needs
    // to locate the summary (based on GUID from name and linkage). Therefore,
    @@ -256,6 +257,10 @@ void FunctionImportGlobalProcessing::processGlobalForThinLTO(GlobalValue &GV) {
    GV.setLinkage(getLinkage(&GV, DoPromote));
    if (!GV.hasLocalLinkage())
    GV.setVisibility(GlobalValue::HiddenVisibility);
  • if (const auto *C = GV.getComdat())
  •  if (C->getName() == Name)
    
  •    RenamedComdats.try_emplace(C, M.getOrInsertComdat(GV.getName()));
    
    } else
    GV.setLinkage(getLinkage(&GV, /* DoPromote */ false));

@@ -280,6 +285,14 @@ void FunctionImportGlobalProcessing::processGlobalsForThinLTO() {
processGlobalForThinLTO(SF);
for (GlobalAlias &GA : M.aliases())
processGlobalForThinLTO(GA);
+

  • if (!RenamedComdats.empty())
  • for (auto &GO : M.global_objects())
  •  if (auto *C = GO.getComdat()) {
    
  •    auto Replacement = RenamedComdats.find(C);
    
  •    if (Replacement != RenamedComdats.end())
    
  •      GO.setComdat(Replacement->second);
    
  •  }
    

}

bool FunctionImportGlobalProcessing::run() {

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 28, 2019

But in any case, I have a fix for the original reproducer. It is pretty
simple but I didn't have a chance to tackle this until late yesterday.
Including it below so you can try it out when you have a chance. I just need
to add some comments and a test case before I send for review early next
week.

I applied the patch to 8.0.0rc1, and it looks good with the latest mozilla-central tree, at least with the original build configuration that prompted this bug. Should I try the CFI build too?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 29, 2019

But in any case, I have a fix for the original reproducer. It is pretty
simple but I didn't have a chance to tackle this until late yesterday.
Including it below so you can try it out when you have a chance. I just need
to add some comments and a test case before I send for review early next
week.

I applied the patch to 8.0.0rc1, and it looks good with the latest
mozilla-central tree, at least with the original build configuration that
prompted this bug.

Great!

Should I try the CFI build too?

Sure, let me know what happens.

I sent the fix for review just now at D57395.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 29, 2019

obj file from CFI build
Using the patch from comment 11, a CFI build fails with:

LLVM ERROR: Associative COMDAT symbol '??$endl@DU?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@0@AEAV10@@z' does not exist.

The flags included:

-fsanitize=cfi-icall
-fsanitize-cfi-icall-generalize-pointers
-fsanitize-blacklist=...
-fno-sanitize-trap=cfi
-fsanitize-recover=cfi

This happened on a small helper binary so it can be reproduced with just the attached .obj file. Link with: lld-link -dll -nodefaultlib

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 30, 2019

Created attachment 21401 [details]
obj file from CFI build

Using the patch from comment 11, a CFI build fails with:

LLVM ERROR: Associative COMDAT symbol
'??$endl@DU?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D
@​std@@@0@AEAV10@@z' does not exist.

The flags included:

-fsanitize=cfi-icall
-fsanitize-cfi-icall-generalize-pointers
-fsanitize-blacklist=...
-fno-sanitize-trap=cfi
-fsanitize-recover=cfi

This happened on a small helper binary so it can be reproduced with just the
attached .obj file. Link with: lld-link -dll -nodefaultlib

This turns out to be a separate instance of renaming provoking this issue. In this case, the module was split due to CFI, and no renaming/promotion was required since the comdat and the leader are in the same module.

However, in LowerTypeTests.cpp a ".cfi" suffix is added to the function definition which is the comdat leader, provoking the same error. This is an existing issue unrelated to the one provoked by changing the splitting default. Probably needs a separate bug to track. You could cc peter@pcc.me.uk and myself.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 30, 2019

Sounds good; I opened bug 40540 for the CFI piece.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 31, 2019

Fixed in r352763.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 31, 2019

Reverted due to bot failures and recommitted with fix in r352770. Looks like it will stick this time.

@zmodem
Copy link
Collaborator

zmodem commented Feb 1, 2019

Reverted due to bot failures and recommitted with fix in r352770. Looks like
it will stick this time.

I've merged that to the llvm 8 branch in r352856. Please let me know if there are any follow-ups.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants