Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AST] Ensure getRawCommentsForAnyRedecl() does not miss any redecl with a comment #108475

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

HighCommander4
Copy link
Collaborator

@HighCommander4 HighCommander4 commented Sep 13, 2024

The previous implementation had a bug where, if it was called on a Decl later in the redecl chain than LastCheckedDecl, it could incorrectly skip and overlook a Decl with a comment.

The patch addresses this by only using LastCheckedDecl if the input Decl D is on the path from the first (canonical) Decl to LastCheckedDecl.

An alternative that was considered was to start the iteration from the (canonical) Decl, however this ran into problems with the modelling of explicit template specializations in the AST where the canonical Decl can be unusual. With the current solution, if no Decls were checked yet, we prefer to check the input Decl over the canonical one.

Fixes #108145

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 13, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 13, 2024

@llvm/pr-subscribers-clang

Author: Nathan Ridge (HighCommander4)

Changes

The intent of the CommentlessRedeclChains cache is that if new redecls have been parsed since the last time getRawCommentsForAnyRedecl() was called, only the new redecls are checked for comments.

However, redecls are a circular list, and if iteration starts from the input decl D, that could be a new one, and we incorrectly skip it while traversing the list to LastCheckedRedecl.

Starting the iteration from the first (canonical) decl makes the cache work as intended.

Fixes #108145


Full diff: https://github.com/llvm/llvm-project/pull/108475.diff

3 Files Affected:

  • (modified) clang/lib/AST/ASTContext.cpp (+1-1)
  • (modified) clang/unittests/AST/CMakeLists.txt (+1)
  • (added) clang/unittests/AST/RawCommentForDeclTest.cpp (+99)
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index a4e6d3b108c8a5..3735534ef3d3f1 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -444,7 +444,7 @@ const RawComment *ASTContext::getRawCommentForAnyRedecl(
     return CommentlessRedeclChains.lookup(CanonicalD);
   }();
 
-  for (const auto Redecl : D->redecls()) {
+  for (const auto Redecl : CanonicalD->redecls()) {
     assert(Redecl);
     // Skip all redeclarations that have been checked previously.
     if (LastCheckedRedecl) {
diff --git a/clang/unittests/AST/CMakeLists.txt b/clang/unittests/AST/CMakeLists.txt
index dcc9bc0f39ac2c..79ad8a28f2b33c 100644
--- a/clang/unittests/AST/CMakeLists.txt
+++ b/clang/unittests/AST/CMakeLists.txt
@@ -33,6 +33,7 @@ add_clang_unittest(ASTTests
   NamedDeclPrinterTest.cpp
   ProfilingTest.cpp
   RandstructTest.cpp
+  RawCommentForDeclTest.cpp
   RecursiveASTVisitorTest.cpp
   SizelessTypesTest.cpp
   SourceLocationTest.cpp
diff --git a/clang/unittests/AST/RawCommentForDeclTest.cpp b/clang/unittests/AST/RawCommentForDeclTest.cpp
new file mode 100644
index 00000000000000..b811df28127d43
--- /dev/null
+++ b/clang/unittests/AST/RawCommentForDeclTest.cpp
@@ -0,0 +1,99 @@
+//===- unittests/AST/RawCommentForDeclTestTest.cpp
+//-------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/DeclGroup.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Tooling/Tooling.h"
+
+#include "gmock/gmock-matchers.h"
+#include "gtest/gtest.h"
+
+namespace clang {
+
+struct FoundComment {
+  std::string DeclName;
+  bool IsDefinition;
+  std::string Comment;
+
+  bool operator==(const FoundComment &RHS) const {
+    return DeclName == RHS.DeclName && IsDefinition == RHS.IsDefinition &&
+           Comment == RHS.Comment;
+  }
+  friend llvm::raw_ostream &operator<<(llvm::raw_ostream &Stream,
+                                       const FoundComment &C) {
+    return Stream << "{Name: " << C.DeclName << ", Def: " << C.IsDefinition
+                  << ", Comment: " << C.Comment << "}";
+  }
+};
+
+class CollectCommentsAction : public ASTFrontendAction {
+public:
+  CollectCommentsAction(std::vector<FoundComment> &Comments)
+      : Comments(Comments) {}
+
+  std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
+                                                 llvm::StringRef) override {
+    CI.getLangOpts().CommentOpts.ParseAllComments = true;
+    return std::make_unique<Consumer>(*this);
+  }
+
+  std::vector<FoundComment> &Comments;
+
+private:
+  class Consumer : public clang::ASTConsumer {
+  private:
+    CollectCommentsAction &Action;
+
+  public:
+    Consumer(CollectCommentsAction &Action) : Action(Action) {}
+    ~Consumer() override {}
+
+    bool HandleTopLevelDecl(DeclGroupRef DG) override {
+      for (Decl *D : DG) {
+        if (NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
+          auto &Ctx = D->getASTContext();
+          const auto *RC = Ctx.getRawCommentForAnyRedecl(D);
+          Action.Comments.push_back(FoundComment{
+              ND->getNameAsString(), IsDefinition(D),
+              RC ? RC->getRawText(Ctx.getSourceManager()).str() : ""});
+        }
+      }
+
+      return true;
+    }
+
+    static bool IsDefinition(const Decl *D) {
+      if (const FunctionDecl *FD = dyn_cast<FunctionDecl>(D)) {
+        return FD->isThisDeclarationADefinition();
+      }
+      if (const TagDecl *TD = dyn_cast<TagDecl>(D)) {
+        return TD->isThisDeclarationADefinition();
+      }
+      return false;
+    }
+  };
+};
+
+TEST(RawCommentForDecl, DefinitionComment) {
+  std::vector<FoundComment> Comments;
+  auto Action = std::make_unique<CollectCommentsAction>(Comments);
+  ASSERT_TRUE(tooling::runToolOnCode(std::move(Action), R"cpp(
+    void f();
+
+    // f is the best
+    void f() {}
+  )cpp"));
+  EXPECT_THAT(Comments, testing::ElementsAre(
+                            FoundComment{"f", false, ""},
+                            FoundComment{"f", true, "// f is the best"}));
+}
+
+} // namespace clang

@HighCommander4
Copy link
Collaborator Author

Buildkite is showing the test Clang :: Index/comment-to-html-xml-conversion.cpp failing. Will investigate.

@HighCommander4 HighCommander4 marked this pull request as draft September 15, 2024 04:06
@HighCommander4
Copy link
Collaborator Author

Buildkite is showing the test Clang :: Index/comment-to-html-xml-conversion.cpp failing. Will investigate.

I've been investigating this failure. It's caused by a slight change of behaviour of ASTContext::getFullComment() on explicit function template specializations, such as:

/// Aaa.
template<typename T, typename U>
void foo(T aaa, U bbb);

/// Bbb.
template<>
void foo(int aaa, int bbb);

While the correct comment (// Bbb.) is found for the explicit specialization, the returned FullComment's DeclInfo does not have its TemplateKind set to TemplateSpecialization as expected.

This is in turn because the code that sets this is conditioned on FunctionDecl::getNumTemplateParameterLists() != 0.

When getRawCommentForAnyRedecl() is called on the FunctionDecl for the explicit specialization (whose getNumTemplateParameterLists() returns 1), it previously returned the input FunctionDecl in its OriginalDecl out-parameter, but as a result of my change, it now returns the input decl's canonical decl, whose getNumTemplateParameterLists() returns 0, and it's this latter decl that ends up in the check mentioned above.

I'm not familiar enough with the AST modeling of template specializations to say what is going wrong here... @gribozavr as the author of the mentioned check, any advice would be appreciated.

@zyn0217
Copy link
Contributor

zyn0217 commented Sep 16, 2024

This is in turn because the code that sets this is conditioned on FunctionDecl::getNumTemplateParameterLists() != 0.

I was curious why it is relying on getNumTemplateParameterLists(). To my understanding, it should call FD->getTemplateSpecializationInfo(), shouldn't it?

EDIT: It seems that NumTemplParamLists isn't something for describing "semantic" template parameter lists but for The number of "outer" template parameter lists. So it would be 0 for implicit template instantiations in the case.

@zyn0217
Copy link
Contributor

zyn0217 commented Sep 16, 2024

@HighCommander4
I extended your test case a little to validate the TemplateDeclKind we would have.

struct FoundComment {
  std::string DeclName;
  bool IsDefinition;
  std::string Comment;
  comments::DeclInfo::TemplateDeclKind TDK;
  // ... comparators are snipped ...
};
  Action.Comments.push_back(FoundComment{
    ND->getNameAsString(), IsDefinition(D),
    RC ? RC->getRawText(Ctx.getSourceManager()).str() : "",
    RC->parse(Ctx, &Action.getCompilerInstance().getPreprocessor(), D)
         ->getDeclInfo()
         ->getTemplateKind()});

So for the following test case,

  /// Aaa.
  template<typename T, typename U>
  void foo(T aaa, U bbb);

  /// Bbb.
  template<>
  void foo(int aaa, int bbb);

I didn't see the following failing

  EXPECT_THAT(
      Comments,
      testing::ElementsAre(
          FoundComment{"foo", false, "/// Aaa.",
                       comments::DeclInfo::TemplateDeclKind::Template},
          FoundComment{
              "foo", false, "/// Bbb.",
              comments::DeclInfo::TemplateDeclKind::TemplateSpecialization}));

Did I misread anything from your last comment?


NVM, the OriginalDecl parameter matters and we should pass it to the parse function... Hmmm

@zyn0217
Copy link
Contributor

zyn0217 commented Sep 16, 2024

So the problem is that we would have another implicitly-created explicit template specialization foo<int, int> in the process of the DeduceTemplateArguments, and that specialization would share the same FunctionTemplateSpecializationInfo with the eventual template specialization that ends up being D in the getRawCommentForAnyRedecl, while the implicitly-created specialization would

  1. have the explicit specialization flag set in CheckFunctionTemplateSpecialization
  2. become the previous declaration of the last explicit specialization
  3. have no ExtInfo set so one cannot get any template parameters from it.

So the status-quo is seemingly violating the contracts of NumTemplParamLists / TemplParamLists in light of their intents.

The TemplateParameterList would be attached to the newly created explicit specialization prior to the call to CheckFunctionTemplateSpecialization(), so one solution (which essentially preserves the ExtInfo in the implicitly-created specialization) is to add the following to CheckFunctionTemplateSpecialization(), in the code block below the comment.

// Mark the prior declaration as an explicit specialization, so that later
// clients know that this is an explicit specialization.
SmallVector<TemplateParameterList *, 4> TPL;
for (unsigned I = 0; I < FD->getNumTemplateParameterLists(); ++I)
  TPL.push_back(FD->getTemplateParameterList(I));
Specialization->setTemplateParameterListsInfo(Context, TPL);

An alternative might be to teach DeclInfo::fill() not to rely on getNumTemplateParameterLists(), but that way we still lack the TemplateParameterList.

EDIT: Having run a quick check-clang test locally, and nothing failed. Probably this is feasible to go.

@HighCommander4
Copy link
Collaborator Author

@zyn0217 Thank you for the analysis and suggestion! I updated the patch as suggested, let's see what buildkite says.

@shafik shafik requested a review from zyn0217 September 17, 2024 03:34
@HighCommander4
Copy link
Collaborator Author

Hmm, quite a few things are failing:

Failed Tests (14):
  Clang :: CXX/drs/cwg28xx.cpp
  Clang :: CXX/drs/cwg7xx.cpp
  Clang :: CXX/temp/temp.spec/temp.expl.spec/p2-20.cpp
  Clang :: CXX/temp/temp.spec/temp.expl.spec/p8.cpp
  Clang :: Index/index-file.cpp
  Clang :: PCH/cxx-ms-function-specialization-class-scope.cpp
  Clang :: PCH/cxx-templates.cpp
  Clang :: SemaCXX/ms-exception-spec.cpp
  Clang :: SemaTemplate/explicit-specialization-member.cpp
  Clang :: SemaTemplate/function-template-specialization.cpp
  Clang :: SemaTemplate/instantiate-member-specialization.cpp
  Clang :: SemaTemplate/member-specialization.cpp
  Clang :: SemaTemplate/ms-function-specialization-class-scope.cpp
  Clangd Unit Tests :: ./ClangdTests/DefineInlineTest/AddInline

@zyn0217
Copy link
Contributor

zyn0217 commented Sep 17, 2024

Sorry, it might be I forgot to save the changes before I ran the tests yesterday!

I looked into it again, and I think I have begun to understand TemplateParameterListsInfo:

The first intent is to describe out-of-line member functions that live in a templated scope. For example,

template <class T>
struct S {
  void foo();
};

template <class T>
void S<T>::foo() {}

So the member function foo, which is not a template, doesn't have any associated template declaration, whereas it has a structure TemplateParameterListsInfo that describes its specifier S<T>.

However, for non-member function explicit specialization in question, ActOnFunctionDeclarator() would also set up its TemplateParameterListsInfo for the "spelled" explicit specialization, and wire up the "implicit" declaration to the "spelled" declaration, in the form of a redeclaration.

Seemingly we might fix the problem in ActOnFunctionDeclarator() (we shouldn't do that in CheckFunctionTemplateSpecialization() because that has more clients than ActOnFunctionDeclarator(), e.g. CXXMethod instantiator that we probably don't need to deal with.) and thereby we don't have to take care of these odd canonical declarations in ASTContext::getRawCommentForAnyRedecl().

diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 3c6a0dff798f..bb02910f1dfe 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -10499,6 +10499,13 @@ Sema::ActOnFunctionDeclarator(Scope *S, Declarator &D, DeclContext *DC,
         if (CheckFunctionTemplateSpecialization(NewFD, ExplicitTemplateArgs,
                                                 Previous))
           NewFD->setInvalidDecl();
+        // For source fidelity, store all the template param lists.
+        if (!NewFD->isInvalidDecl() && TemplateParamLists.size() > 0) {
+          auto *Specialization = Previous.getAsSingle<FunctionDecl>();
+          assert(Specialization);
+          Specialization->setTemplateParameterListsInfo(Context,
+                                                        TemplateParamLists);
+        }
       }
     } else if (isMemberSpecialization && !FunctionTemplate) {
       if (CheckMemberSpecialization(NewFD, Previous))

I tried that approach locally and now I have all check-clang tests passed but some of the check-clangd tests failed

Clangd :: include-cleaner-batch-fix.test
Clangd Unit Tests :: ./ClangdTests/DefineInlineTest/AddInline (DefineInlineTest.AddInline, around line 990)

I don't know if people already have some assumptions that an "implicit" explicit specialization shouldn't have its TemplateParameterListsInfo, though :(

@HighCommander4
Copy link
Collaborator Author

HighCommander4 commented Sep 17, 2024

@zyn0217 I've debugged the DefineInlineTest.AddInline failure with your new suggested change, it concerns the following scenario:

header.h

template <typename T> void foo();
template <> void foo<int>();

source.cpp:

#include "header.h"
template <> void foo<int>() {}

It seems the FuntionDecl for the explicit specialization declaration in the header file gets assigned the template parameter lists of the definition in the main file, and its getBeginLoc() starts returning the main-file location (since DeclaratorDecl::getOuterLocStart() checks the template parameter lists).

for (const auto Redecl : D->redecls()) {
for (const auto Redecl : CanonicalD->redecls()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reread your analysis and I think the only safe usage of getRawCommentForAnyRedecl() previously was that, in the second call to the function, only those Ds that live in between the CanonicalDecl and LastCheckedRedecl are expected to be used as the starting point of the traversal. Otherwise, the presence of LastCheckedRedecl would result in skipping over all declarations, including those whose associated comments were unparsed yet.

So I wonder if we could make use of LastCheckedRedecl opt-in. To be clear, we first check if D is in the path Canonical -> LastCheckedRedecl. If so, we use the mechanism that skips past every visited declaration; otherwise, we ignore it.

// Any redeclarations of D that we haven't checked for comments yet?
const Decl *LastCheckedRedecl = CommentlessRedeclChains.lookup(CanonicalD);
bool CanUseCommentlessCache = false;
if (LastCheckedRedecl) {
  for (auto *Redecl : CanonicalD->redecls()) {
    if (Redecl == D) {
      CanUseCommentlessCache = true;
      break;
    }
    if (Redecl == LastCheckedRedecl)
      break;
   }
}
// Skip all redeclarations that have been checked previously.
if (CanUseCommentlessCache && LastCheckedRedecl)
  ...

The solution could be improved to pick up the LastCheckedRedecl again after visiting the CanonicalDecl, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! It indeed feels safer to tweak getRawCommentsForAnyRedecl() like this than to muck around with the modeling of template specializations 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I still feel the modeling of template specializations should be improved regarding getSourceRange() relying on TemplateParameterListsInfo, which is really peculiar. But that's another murky topic unrelated to this patch, anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding getSourceRange() relying on TemplateParameterListsInfo, which is really peculiar

FWIW, that part actually makes sense to me. For a function/method declaration that looks like this:

template <...>
ReturnType FunctionName(Parameters...);

the answer to the question "where does the source range begin?" is indeed "if there is a template parameter list, where does it begin?"

Copy link
Collaborator Author

@HighCommander4 HighCommander4 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The part that makes less sense to me is, why does the explicit specialization have a redeclaration at all in a case like

/// Aaa.
template<typename T, typename U>
void foo(T aaa, U bbb);

/// Bbb.
template<>
void foo(int aaa, int bbb);

There is only one declaration of the explicit specialization in the code.)

Copy link
Contributor

@zyn0217 zyn0217 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, in fact, I think we're not too confident about how many FunctionDecls would be generated during the template parsing e.g. they can be the byproducts of a TreeTransform (it does not instantiate declarations directly, but it could delegate calls to Sema that creates extra declarations)/the process of the template argument deduction, such that the contract of TemplateParameterListsInfo would be less reliable.

@HighCommander4
Copy link
Collaborator Author

Buildkite is green with this approach! Graduated patch from "Draft" state.

Copy link
Contributor

@zyn0217 zyn0217 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I think this probably needs a release note, otherwise it looks good on the whole.

Please give other folks some time before merging it, and I invited @AaronBallman for the second pair of eyes.

clang/lib/AST/ASTContext.cpp Outdated Show resolved Hide resolved
Comment on lines +74 to +78
if (const FunctionDecl *FD = dyn_cast<FunctionDecl>(D)) {
return FD->isThisDeclarationADefinition();
}
if (const TagDecl *TD = dyn_cast<TagDecl>(D)) {
return TD->isThisDeclarationADefinition();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to handle *TemplateDecls too, but it is not required as we don't actually need them now.

clang/unittests/AST/RawCommentForDeclTest.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! The changes should also come with a release note so users know about the improvement. Otherwise, changes basically LG modulo nits.

clang/lib/AST/ASTContext.cpp Outdated Show resolved Hide resolved
@HighCommander4
Copy link
Collaborator Author

Thanks for the reviews! I'll add the release note shortly (need to update to a newer baseline first).

@HighCommander4 HighCommander4 changed the title [AST] Iterate redecls starting from the canonical one in getRawCommentsForAnyRedecl() [AST] Ensure getRawCommentsForAnyRedecl() does not miss any redecl with a comment Sep 20, 2024
@HighCommander4
Copy link
Collaborator Author

Note, I also updated the commit message to reflect the new fix approach.

@HighCommander4
Copy link
Collaborator Author

(Rebased)

…th a comment

The previous implementation had a bug where, if it was called on a
Decl later in the redecl chain than `LastCheckedDecl`, it could
incorrectly skip and overlook a Decl with a comment.

The patch addresses this by only using `LastCheckedDecl` if the input
Decl `D` is on the path from the first (canonical) Decl to
`LastCheckedDecl`.

An alternative that was considered was to start the iteration from
the (canonical) Decl, however this ran into problems with the
modelling of explicit template specializations in the AST where
the canonical Decl can be unusual. With the current solution, if no
Decls were checked yet, we prefer to check the input Decl over the
canonical one.
@HighCommander4
Copy link
Collaborator Author

Added release note. I put it under "Bug Fixes to AST Handling" which seemed like a good fit.

@HighCommander4 HighCommander4 merged commit ea57880 into llvm:main Sep 20, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

getRawCommentForAnyRedecl can fail to return definition comment
4 participants