Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SampleFDO] Read call-graph matching recovered top-level function profile #101053

Merged
merged 10 commits into from
Sep 4, 2024

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Jul 29, 2024

With extbinary profile format, initial profile loading only reads profile based on current function names in the module. However, if a function is renamed, sample loader skips to load its original profile(which has a different name), we will miss this case. To address this, we load the top-level profile candidate explicitly for the matching. If a match is found later, the function profile will be further preserved for use by the sample loader.

@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Jul 29, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-pgo

Author: Lei Wang (wlei-llvm)

Changes

In the extended binary format, the top-level function profile is only read when the functions shows in the current module. However, if a top-level function is renamed, it's never read in the profile, call graph matching will miss this case. Therefore, this patch adds the support to read top-level functions if they are potential candidates for call-graph matching. If a match is found later, the function will be further preserved for use by the sample loader.

To do this, we need to tweak the current sample profile reader to support read the profile on-demand by given functions(instead of module), so I added a new function readOndemand for this, and because the metadata is read separately from the function profile, this also includes to read the metadata on-demand(readFuncMetadataOnDemand).


Patch is 36.94 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101053.diff

6 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/SampleProfReader.h (+47)
  • (modified) llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h (+1)
  • (modified) llvm/lib/ProfileData/SampleProfReader.cpp (+148-96)
  • (modified) llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp (+54-10)
  • (added) llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-toplev-func.prof (+23)
  • (added) llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-toplev-func.ll (+258)
diff --git a/llvm/include/llvm/ProfileData/SampleProfReader.h b/llvm/include/llvm/ProfileData/SampleProfReader.h
index f4bdc6525308d..b124233a02d11 100644
--- a/llvm/include/llvm/ProfileData/SampleProfReader.h
+++ b/llvm/include/llvm/ProfileData/SampleProfReader.h
@@ -392,6 +392,11 @@ class SampleProfileReader {
   /// which doesn't support loading function profiles on demand.
   virtual bool collectFuncsFromModule() { return false; }
 
+  virtual std::error_code readOnDemand(const DenseSet<StringRef> &FuncsToUse,
+                                       SampleProfileMap &Profiles) {
+    return sampleprof_error::not_implemented;
+  };
+
   /// Print all the profiles on stream \p OS.
   void dump(raw_ostream &OS = dbgs());
 
@@ -413,6 +418,16 @@ class SampleProfileReader {
     if (It != Profiles.end())
       return &It->second;
 
+    if (FuncNameToProfNameMap && !FuncNameToProfNameMap->empty()) {
+      auto R = FuncNameToProfNameMap->find(FunctionId(Fname));
+      if (R != FuncNameToProfNameMap->end()) {
+        Fname = R->second.stringRef();
+        auto It = Profiles.find(FunctionId(Fname));
+        if (It != Profiles.end())
+          return &It->second;
+      }
+    }
+
     if (Remapper) {
       if (auto NameInProfile = Remapper->lookUpNameInProfile(Fname)) {
         auto It = Profiles.find(FunctionId(*NameInProfile));
@@ -494,6 +509,11 @@ class SampleProfileReader {
 
   void setModule(const Module *Mod) { M = Mod; }
 
+  void setFuncNameToProfNameMap(
+      HashKeyMap<std::unordered_map, FunctionId, FunctionId> *FPMap) {
+    FuncNameToProfNameMap = FPMap;
+  }
+
 protected:
   /// Map every function to its associated profile.
   ///
@@ -522,6 +542,21 @@ class SampleProfileReader {
 
   std::unique_ptr<SampleProfileReaderItaniumRemapper> Remapper;
 
+  // A map pointer to the FuncNameToProfNameMap in SampleProfileLoader,
+  // which maps the function name to the matched profile name. This is used
+  // for sample loader to look up profile using the new name.
+  HashKeyMap<std::unordered_map, FunctionId, FunctionId>
+      *FuncNameToProfNameMap = nullptr;
+
+  // A map from a function's context hash to its meta data section range, used
+  // for on-demand read function profile metadata.
+  std::unordered_map<uint64_t, std::pair<const uint8_t *, const uint8_t *>>
+      FContextToMetaDataSecRange;
+
+  std::pair<const uint8_t *, const uint8_t *> LBRProfileSecRange;
+
+  bool ProfileHasAttribute = false;
+
   /// \brief Whether samples are collected based on pseudo probes.
   bool ProfileIsProbeBased = false;
 
@@ -621,6 +656,8 @@ class SampleProfileReaderBinary : public SampleProfileReader {
 
   /// Read the next function profile instance.
   std::error_code readFuncProfile(const uint8_t *Start);
+  std::error_code readFuncProfile(const uint8_t *Start,
+                                  SampleProfileMap &Profiles);
 
   /// Read the contents of the given profile instance.
   std::error_code readProfile(FunctionSamples &FProfile);
@@ -720,11 +757,15 @@ class SampleProfileReaderExtBinaryBase : public SampleProfileReaderBinary {
   std::error_code readSecHdrTableEntry(uint64_t Idx);
   std::error_code readSecHdrTable();
 
+  std::error_code readFuncMetadataOnDemand(bool ProfileHasAttribute,
+                                           SampleProfileMap &Profiles);
   std::error_code readFuncMetadata(bool ProfileHasAttribute);
   std::error_code readFuncMetadata(bool ProfileHasAttribute,
                                    FunctionSamples *FProfile);
   std::error_code readFuncOffsetTable();
   std::error_code readFuncProfiles();
+  std::error_code readFuncProfiles(const DenseSet<StringRef> &FuncsToUse,
+                                   SampleProfileMap &Profiles);
   std::error_code readNameTableSec(bool IsMD5, bool FixedLengthMD5);
   std::error_code readCSNameTableSec();
   std::error_code readProfileSymbolList();
@@ -776,6 +817,12 @@ class SampleProfileReaderExtBinaryBase : public SampleProfileReaderBinary {
   /// the reader has been given a module.
   bool collectFuncsFromModule() override;
 
+  /// Read the profiles on-demand for the given functions. This is used after
+  /// stale call graph matching finds new functions whose profiles aren't read
+  /// at the beginning and we need to re-read the profiles.
+  std::error_code readOnDemand(const DenseSet<StringRef> &FuncsToUse,
+                               SampleProfileMap &Profiles) override;
+
   std::unique_ptr<ProfileSymbolList> getProfileSymbolList() override {
     return std::move(ProfSymList);
   };
diff --git a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
index a67f158433391..67edea42e2fe1 100644
--- a/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
+++ b/llvm/include/llvm/Transforms/IPO/SampleProfileMatcher.h
@@ -198,6 +198,7 @@ class SampleProfileMatcher {
   // function and all inlinees.
   void countMismatchedCallsiteSamples(const FunctionSamples &FS);
   void computeAndReportProfileStaleness();
+  void UpdateSampleLoaderWithRecoveredProfiles();
 
   LocToLocMap &getIRToProfileLocationMap(const Function &F) {
     auto Ret = FuncMappings.try_emplace(
diff --git a/llvm/lib/ProfileData/SampleProfReader.cpp b/llvm/lib/ProfileData/SampleProfReader.cpp
index 4752465fc072e..1fe89b9a48832 100644
--- a/llvm/lib/ProfileData/SampleProfReader.cpp
+++ b/llvm/lib/ProfileData/SampleProfReader.cpp
@@ -653,7 +653,8 @@ SampleProfileReaderBinary::readProfile(FunctionSamples &FProfile) {
 }
 
 std::error_code
-SampleProfileReaderBinary::readFuncProfile(const uint8_t *Start) {
+SampleProfileReaderBinary::readFuncProfile(const uint8_t *Start,
+                                           SampleProfileMap &Profiles) {
   Data = Start;
   auto NumHeadSamples = readNumber<uint64_t>();
   if (std::error_code EC = NumHeadSamples.getError())
@@ -678,6 +679,11 @@ SampleProfileReaderBinary::readFuncProfile(const uint8_t *Start) {
   return sampleprof_error::success;
 }
 
+std::error_code
+SampleProfileReaderBinary::readFuncProfile(const uint8_t *Start) {
+  return readFuncProfile(Start, Profiles);
+}
+
 std::error_code SampleProfileReaderBinary::readImpl() {
   ProfileIsFS = ProfileIsFSDisciminator;
   FunctionSamples::ProfileIsFS = ProfileIsFS;
@@ -725,6 +731,7 @@ std::error_code SampleProfileReaderExtBinaryBase::readOneSection(
     break;
   }
   case SecLBRProfile:
+    LBRProfileSecRange = std::make_pair(Data, End);
     if (std::error_code EC = readFuncProfiles())
       return EC;
     break;
@@ -745,9 +752,9 @@ std::error_code SampleProfileReaderExtBinaryBase::readOneSection(
     ProfileIsProbeBased =
         hasSecFlag(Entry, SecFuncMetadataFlags::SecFlagIsProbeBased);
     FunctionSamples::ProfileIsProbeBased = ProfileIsProbeBased;
-    bool HasAttribute =
+    ProfileHasAttribute =
         hasSecFlag(Entry, SecFuncMetadataFlags::SecFlagHasAttribute);
-    if (std::error_code EC = readFuncMetadata(HasAttribute))
+    if (std::error_code EC = readFuncMetadata(ProfileHasAttribute))
       return EC;
     break;
   }
@@ -791,6 +798,19 @@ bool SampleProfileReaderExtBinaryBase::useFuncOffsetList() const {
   return false;
 }
 
+std::error_code SampleProfileReaderExtBinaryBase::readOnDemand(
+    const DenseSet<StringRef> &FuncsToUse, SampleProfileMap &Profiles) {
+  Data = LBRProfileSecRange.first;
+  End = LBRProfileSecRange.second;
+  if (std::error_code EC = readFuncProfiles(FuncsToUse, Profiles))
+    return EC;
+  End = Data;
+
+  if (std::error_code EC =
+          readFuncMetadataOnDemand(ProfileHasAttribute, Profiles))
+    return EC;
+  return sampleprof_error::success;
+}
 
 bool SampleProfileReaderExtBinaryBase::collectFuncsFromModule() {
   if (!M)
@@ -838,110 +858,118 @@ std::error_code SampleProfileReaderExtBinaryBase::readFuncOffsetTable() {
  return sampleprof_error::success;
 }
 
-std::error_code SampleProfileReaderExtBinaryBase::readFuncProfiles() {
-  // Collect functions used by current module if the Reader has been
-  // given a module.
-  // collectFuncsFromModule uses FunctionSamples::getCanonicalFnName
-  // which will query FunctionSamples::HasUniqSuffix, so it has to be
-  // called after FunctionSamples::HasUniqSuffix is set, i.e. after
-  // NameTable section is read.
-  bool LoadFuncsToBeUsed = collectFuncsFromModule();
-
-  // When LoadFuncsToBeUsed is false, we are using LLVM tool, need to read all
-  // profiles.
-  const uint8_t *Start = Data;
-  if (!LoadFuncsToBeUsed) {
-    while (Data < End) {
-      if (std::error_code EC = readFuncProfile(Data))
-        return EC;
+std::error_code SampleProfileReaderExtBinaryBase::readFuncProfiles(
+    const DenseSet<StringRef> &FuncsToUse, SampleProfileMap &Profiles) {
+ const uint8_t *Start = Data;
+
+ if (Remapper) {
+    for (auto Name : FuncsToUse) {
+      Remapper->insert(Name);
     }
-    assert(Data == End && "More data is read than expected");
-  } else {
-    // Load function profiles on demand.
-    if (Remapper) {
-      for (auto Name : FuncsToUse) {
-        Remapper->insert(Name);
-      }
+ }
+
+ if (ProfileIsCS) {
+    assert(useFuncOffsetList());
+    DenseSet<uint64_t> FuncGuidsToUse;
+    if (useMD5()) {
+      for (auto Name : FuncsToUse)
+        FuncGuidsToUse.insert(Function::getGUID(Name));
     }
 
-    if (ProfileIsCS) {
-      assert(useFuncOffsetList());
-      DenseSet<uint64_t> FuncGuidsToUse;
-      if (useMD5()) {
-        for (auto Name : FuncsToUse)
-          FuncGuidsToUse.insert(Function::getGUID(Name));
+    // For each function in current module, load all context profiles for
+    // the function as well as their callee contexts which can help profile
+    // guided importing for ThinLTO. This can be achieved by walking
+    // through an ordered context container, where contexts are laid out
+    // as if they were walked in preorder of a context trie. While
+    // traversing the trie, a link to the highest common ancestor node is
+    // kept so that all of its decendants will be loaded.
+    const SampleContext *CommonContext = nullptr;
+    for (const auto &NameOffset : FuncOffsetList) {
+      const auto &FContext = NameOffset.first;
+      FunctionId FName = FContext.getFunction();
+      StringRef FNameString;
+      if (!useMD5())
+        FNameString = FName.stringRef();
+
+      // For function in the current module, keep its farthest ancestor
+      // context. This can be used to load itself and its child and
+      // sibling contexts.
+      if ((useMD5() && FuncGuidsToUse.count(FName.getHashCode())) ||
+          (!useMD5() && (FuncsToUse.count(FNameString) ||
+                         (Remapper && Remapper->exist(FNameString))))) {
+        if (!CommonContext || !CommonContext->isPrefixOf(FContext))
+          CommonContext = &FContext;
       }
 
-      // For each function in current module, load all context profiles for
-      // the function as well as their callee contexts which can help profile
-      // guided importing for ThinLTO. This can be achieved by walking
-      // through an ordered context container, where contexts are laid out
-      // as if they were walked in preorder of a context trie. While
-      // traversing the trie, a link to the highest common ancestor node is
-      // kept so that all of its decendants will be loaded.
-      const SampleContext *CommonContext = nullptr;
-      for (const auto &NameOffset : FuncOffsetList) {
-        const auto &FContext = NameOffset.first;
-        FunctionId FName = FContext.getFunction();
-        StringRef FNameString;
-        if (!useMD5())
-          FNameString = FName.stringRef();
-
-        // For function in the current module, keep its farthest ancestor
-        // context. This can be used to load itself and its child and
-        // sibling contexts.
-        if ((useMD5() && FuncGuidsToUse.count(FName.getHashCode())) ||
-            (!useMD5() && (FuncsToUse.count(FNameString) ||
-                           (Remapper && Remapper->exist(FNameString))))) {
-          if (!CommonContext || !CommonContext->isPrefixOf(FContext))
-            CommonContext = &FContext;
-        }
-
-        if (CommonContext == &FContext ||
-            (CommonContext && CommonContext->isPrefixOf(FContext))) {
-          // Load profile for the current context which originated from
-          // the common ancestor.
-          const uint8_t *FuncProfileAddr = Start + NameOffset.second;
-          if (std::error_code EC = readFuncProfile(FuncProfileAddr))
-            return EC;
-        }
-      }
-    } else if (useMD5()) {
-      assert(!useFuncOffsetList());
-      for (auto Name : FuncsToUse) {
-        auto GUID = MD5Hash(Name);
-        auto iter = FuncOffsetTable.find(GUID);
-        if (iter == FuncOffsetTable.end())
-          continue;
-        const uint8_t *FuncProfileAddr = Start + iter->second;
-        if (std::error_code EC = readFuncProfile(FuncProfileAddr))
-          return EC;
-      }
-    } else if (Remapper) {
-      assert(useFuncOffsetList());
-      for (auto NameOffset : FuncOffsetList) {
-        SampleContext FContext(NameOffset.first);
-        auto FuncName = FContext.getFunction();
-        StringRef FuncNameStr = FuncName.stringRef();
-        if (!FuncsToUse.count(FuncNameStr) && !Remapper->exist(FuncNameStr))
-          continue;
+      if (CommonContext == &FContext ||
+          (CommonContext && CommonContext->isPrefixOf(FContext))) {
+        // Load profile for the current context which originated from
+        // the common ancestor.
         const uint8_t *FuncProfileAddr = Start + NameOffset.second;
         if (std::error_code EC = readFuncProfile(FuncProfileAddr))
           return EC;
       }
-    } else {
-      assert(!useFuncOffsetList());
-      for (auto Name : FuncsToUse) {
-        auto iter = FuncOffsetTable.find(MD5Hash(Name));
-        if (iter == FuncOffsetTable.end())
-          continue;
-        const uint8_t *FuncProfileAddr = Start + iter->second;
-        if (std::error_code EC = readFuncProfile(FuncProfileAddr))
-          return EC;
-      }
     }
+ } else if (useMD5()) {
+    assert(!useFuncOffsetList());
+    for (auto Name : FuncsToUse) {
+      auto GUID = MD5Hash(Name);
+      auto iter = FuncOffsetTable.find(GUID);
+      if (iter == FuncOffsetTable.end())
+        continue;
+      const uint8_t *FuncProfileAddr = Start + iter->second;
+      if (std::error_code EC = readFuncProfile(FuncProfileAddr, Profiles))
+        return EC;
+    }
+ } else if (Remapper) {
+    assert(useFuncOffsetList());
+    for (auto NameOffset : FuncOffsetList) {
+      SampleContext FContext(NameOffset.first);
+      auto FuncName = FContext.getFunction();
+      StringRef FuncNameStr = FuncName.stringRef();
+      if (!FuncsToUse.count(FuncNameStr) && !Remapper->exist(FuncNameStr))
+        continue;
+      const uint8_t *FuncProfileAddr = Start + NameOffset.second;
+      if (std::error_code EC = readFuncProfile(FuncProfileAddr, Profiles))
+        return EC;
+    }
+ } else {
+    assert(!useFuncOffsetList());
+    for (auto Name : FuncsToUse) {
+
+      auto iter = FuncOffsetTable.find(MD5Hash(Name));
+      if (iter == FuncOffsetTable.end())
+        continue;
+      const uint8_t *FuncProfileAddr = Start + iter->second;
+      if (std::error_code EC = readFuncProfile(FuncProfileAddr, Profiles))
+        return EC;
+    }
+ }
+}
+
+std::error_code SampleProfileReaderExtBinaryBase::readFuncProfiles() {
+ // Collect functions used by current module if the Reader has been
+ // given a module.
+ // collectFuncsFromModule uses FunctionSamples::getCanonicalFnName
+ // which will query FunctionSamples::HasUniqSuffix, so it has to be
+ // called after FunctionSamples::HasUniqSuffix is set, i.e. after
+ // NameTable section is read.
+ bool LoadFuncsToBeUsed = collectFuncsFromModule();
+
+ // When LoadFuncsToBeUsed is false, we are using LLVM tool, need to read all
+ // profiles.
+ if (!LoadFuncsToBeUsed) {
+    while (Data < End) {
+      if (std::error_code EC = readFuncProfile(Data))
+        return EC;
+    }
+    assert(Data == End && "More data is read than expected");
+ } else {
+    // Load function profiles on demand.
+    if (std::error_code EC = readFuncProfiles(FuncsToUse, Profiles))
+      return EC;
     Data = End;
-  }
+ }
   assert((CSProfileCount == 0 || CSProfileCount == Profiles.size()) &&
          "Cannot have both context-sensitive and regular profile");
   assert((!CSProfileCount || ProfileIsCS) &&
@@ -1245,6 +1273,27 @@ SampleProfileReaderExtBinaryBase::readFuncMetadata(bool ProfileHasAttribute,
   return sampleprof_error::success;
 }
 
+std::error_code SampleProfileReaderExtBinaryBase::readFuncMetadataOnDemand(
+    bool ProfileHasAttribute, SampleProfileMap &Profiles) {
+  if (FContextToMetaDataSecRange.empty())
+    return sampleprof_error::success;
+
+  for (auto &I : Profiles) {
+    FunctionSamples *FProfile = &I.second;
+    auto R =
+        FContextToMetaDataSecRange.find(FProfile->getContext().getHashCode());
+    if (R == FContextToMetaDataSecRange.end())
+      continue;
+
+    Data = R->second.first;
+    End = R->second.second;
+    if (std::error_code EC = readFuncMetadata(ProfileHasAttribute, FProfile))
+      return EC;
+    assert(Data == End && "More data is read than expected");
+  }
+  return sampleprof_error::success;
+}
+
 std::error_code
 SampleProfileReaderExtBinaryBase::readFuncMetadata(bool ProfileHasAttribute) {
   while (Data < End) {
@@ -1257,8 +1306,11 @@ SampleProfileReaderExtBinaryBase::readFuncMetadata(bool ProfileHasAttribute) {
     if (It != Profiles.end())
       FProfile = &It->second;
 
+    const uint8_t *Start = Data;
     if (std::error_code EC = readFuncMetadata(ProfileHasAttribute, FProfile))
       return EC;
+
+    FContextToMetaDataSecRange[FContext.getHashCode()] = {Start, Data};
   }
 
   assert(Data == End && "More data is read than expected");
diff --git a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
index 312672e56b017..b9adc6a0631b8 100644
--- a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
@@ -782,6 +782,26 @@ bool SampleProfileMatcher::functionMatchesProfileHelper(
   float Similarity = 0.0;
 
   const auto *FSFlattened = getFlattenedSamplesFor(ProfFunc);
+  // Check if the function is top-level function. For extended profile format,
+  // if a function profile is unused and it's top-level, even if the profile is
+  // matched, it's not found in the profile. This is because sample reader only
+  // read the used profile at the beginning, we need to read the profile
+  // on-demand. Also save it into the FlattenedProfiles for future look-up.
+  if (!FSFlattened) {
+    DenseSet<StringRef> TopLevelFunc;
+    TopLevelFunc.insert(ProfFunc.stringRef());
+    SampleProfileMap TopLevelProfile;
+    Reader.readOnDemand(TopLevelFunc, TopLevelProfile);
+    assert(TopLevelProfile.size() <= 1 &&
+           "More than one profile is found for top-level function");
+    if (!TopLevelProfile.empty()) {
+      LLVM_DEBUG(dbgs() << "Read top-level function " << ProfFunc
+                        << " for call-graph matching\n");
+      auto &FS = TopLevelProfile.begin()->second;
+      FSFlattened =
+          &(FlattenedProfiles.create(FS.getContext()) = std::move(FS));
+    }
+  }
   if (!FSFlattened)
     return false;
   // The check for similarity or checksum may not be reliable if the function is
@@ -863,6 +883,39 @@ bool SampleProfileMatcher::functionMatchesProfile(Function &IRFunc,
   return Matched;
 }
 
+void SampleProfileMatcher::UpdateSampleLoaderWithRecoveredProfiles() {
+  DenseSet<StringRef> RecoveredFuncs;
+  // Update FuncNameToProfNameMap and SymbolMap.
+  for (auto &I : FuncToProfileNameMap) {
+    assert(I.first && "New function is null");
+    FunctionId FuncName(I.first->getName());
+    RecoveredFuncs.insert(I.second.stringRef());
+    FuncNameToProfNameMap->emplace(FuncName, I.second);
+
+    // We need to remove the old entry to avoid duplicating the function
+    // processing.
+    SymbolMap->erase(FuncName);
+    SymbolMap->emplace(I.second, I.first);
+  }
+
+  // Read the top-level profiles for the recovered function profiles. This is
+  // because in extended binary format it only loads the top-level profile for
+  // the functions in the new build but not the recovered functions which is
+  // from the old bu...
[truncated]

Copy link

github-actions bot commented Jul 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@wlei-llvm wlei-llvm changed the title [SampleFDO] Read call-graph matching recovered top-level function profiless [SampleFDO] Read call-graph matching recovered top-level function profile Jul 30, 2024
@@ -392,6 +392,11 @@ class SampleProfileReader {
/// which doesn't support loading function profiles on demand.
virtual bool collectFuncsFromModule() { return false; }

virtual std::error_code readOnDemand(const DenseSet<StringRef> &FuncsToUse,
SampleProfileMap &Profiles) {
return sampleprof_error::not_implemented;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a pure virtual function here? See readHeader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only support extended binary format, if we use pure virtual function, we may need more duplicated override functions(only return not_implemented) for all other derived classes.

@@ -392,6 +392,11 @@ class SampleProfileReader {
/// which doesn't support loading function profiles on demand.
virtual bool collectFuncsFromModule() { return false; }

virtual std::error_code readOnDemand(const DenseSet<StringRef> &FuncsToUse,
Copy link
Member

@WenleiHe WenleiHe Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: On-demand is how this API is used in call graph matching case. But as an API, the key difference comparing to readFuncProfiles is it reads given set of functions, not on-demand. Suggest to name API based on that, maybe just overload readFuncProfiles/read since the parameters imply the difference?

Same for readFuncMetadataOnDemand.

SymbolMap->emplace(I.second, I.first);
}

// Read the top-level profiles for the recovered function profiles. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"recovered function" can be unclear it means, how about "function with different profile name"?

The comment also reads a bit unclear, how about this

"
With extbinary profile format, initial profile loading only reads profile based on current function names in the module, so we need to load top-level profiles for functions with different profile name explicitly after function-profile name map is established with stale profile matching.
"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!

SymbolMap->erase(FuncName);
SymbolMap->emplace(I.second, I.first);
}
UpdateSampleLoaderWithRecoveredProfiles();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename as UpdateWithSalvagedProfiles?

Comment on lines 909 to 910
LLVM_DEBUG(dbgs() << "Top-level function " << I.second.getFunction()
<< " is recovered and re-read by the sample reader.\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: recovered -> salvaged, top-level function -> profile for top-level function

auto &Ctx = I.second.getContext();
assert(Profiles.find(Ctx) == Profiles.end() &&
"Top level profile is found for the unused profile");
Profiles.create(Ctx) = std::move(I.second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we still populate the newly loaded profiles into the same SampleProfileReader::Profiles map? This would make reader APIs simpler and more consistent, and we won't need the assertion above either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I was thinking using a different profile map so that we can debug check the reloaded profiles for the test, but I can test check it in a different place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was suggesting is that the read no longer need a SampleProfileMap parameter to hold newly loaded profile. I'm still seeing read having that parameter in the latest version?

@@ -782,6 +782,26 @@ bool SampleProfileMatcher::functionMatchesProfileHelper(
float Similarity = 0.0;

const auto *FSFlattened = getFlattenedSamplesFor(ProfFunc);
// Check if the function is top-level function. For extended profile format,
// if a function profile is unused and it's top-level, even if the profile is
// matched, it's not found in the profile. This is because sample reader only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rewrite this sentence and maybe the whole comment :)

even if the profile is matched, it's not found in the profile

DenseSet<StringRef> TopLevelFunc;
TopLevelFunc.insert(ProfFunc.stringRef());
SampleProfileMap TopLevelProfile;
Reader.readOnDemand(TopLevelFunc, TopLevelProfile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does this happen on this path (coming from LCS I think)? And how useful it is to load profile here?

Asking because we probably don't want to load many functions just to do the similarity checks during LCS, as loading profile is kind of expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key part to load the top-level profile. If we don't load the profile, LCS(for cg matching) will never run. see the line ( if (!FSFlattened) return false;), the function just bails out if it doesn't find any profile.

Asking because we probably don't want to load many functions just to do the similarity checks during LCS, as loading profile is kind of expensive?

Good question! Overall, I think the cost should be fine here.
Note that this is in deep branch, we have a lot of filters(same caller scope, profile not in flattened profile), then the chance to run this code is probably not big.

Additionally, for the on-demand read https://github.com/llvm/llvm-project/blob/main/llvm/lib/ProfileData/SampleProfReader.cpp#L935-L937

See it check the FuncOffsetTable(not iterating all the FuncOffsetTable) and read one profile, reading nested profile in one profile is probably fine.

(However, one thing I noticed that 4fe91e0 seems not in llvm-17- branch, we may need to port it to the old version branch)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we add a switch to on/off this feature? Just in case this (loading profile for matched functions) overall becomes expensive.

llvm/include/llvm/ProfileData/SampleProfReader.h Outdated Show resolved Hide resolved
if (std::error_code EC = readFuncMetadata(ProfileHasAttribute, FProfile))
return EC;

FContextToMetaDataSecRange[FContext.getHashCode()] = {Start, Data};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a function doesn't have any attribute, we can skip inserting the index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It Is possible that the a function doesn't have attribute but do have FunctionHash, in this case, we still need to insert the index.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change summary needs to be updated, we no longer have readOnDemand

llvm/include/llvm/ProfileData/SampleProfReader.h Outdated Show resolved Hide resolved
llvm/include/llvm/ProfileData/SampleProfReader.h Outdated Show resolved Hide resolved
DenseSet<StringRef> TopLevelFunc;
TopLevelFunc.insert(ProfFunc.stringRef());
SampleProfileMap TopLevelProfile;
Reader.readOnDemand(TopLevelFunc, TopLevelProfile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we add a switch to on/off this feature? Just in case this (loading profile for matched functions) overall becomes expensive.

if (!FSFlattened) {
DenseSet<StringRef> TopLevelFunc({ProfFunc.stringRef()});
SampleProfileMap TopLevelProfile;
Reader.read(TopLevelFunc, TopLevelProfile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will actually return not_implemented for other format, we should probably check success before proceeding the following logic to read profile, to make it explicit.

auto &Ctx = I.second.getContext();
assert(Profiles.find(Ctx) == Profiles.end() &&
"Top level profile is found for the unused profile");
Profiles.create(Ctx) = std::move(I.second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was suggesting is that the read no longer need a SampleProfileMap parameter to hold newly loaded profile. I'm still seeing read having that parameter in the latest version?

Comment on lines 792 to 793
SampleProfileMap TopLevelProfile;
Reader.read(TopLevelFunc, TopLevelProfile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we populate the existing Profiles map in reader, the read API won't need a SampleProfileMap parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different to the one in UpdateWithSalvagedProfiles. We use a different profile map for the matching, FlattenedProfiles. so here the output map is the FlattenedProfiles, not the sample reader's internal Profiles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still make the API simpler for reader - we can always load into reader's profile map, and then retriever it by name for downstream use. Reader can serve as a cache too.

OTOH, the profile from reader isn't guaranteed to be flattened, why do we put it directly into flattened profile map anyways?

Copy link
Contributor Author

@wlei-llvm wlei-llvm Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. It's a little bit tricky, there could be more places to use the FlattenedProfiles(the profiles for the matching), then For anywhere it's used, we also need to check original samples reader profile map. (I was thinking to rename FlattenedProfiles and use it so that we only have one container.). Right now there is only one more place, just updated that.

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you commit the pure refactoring part first?

@wlei-llvm
Copy link
Contributor Author

Can you commit the pure refactoring part first?

Created a new PR: #104654

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a nit.

llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp Outdated Show resolved Hide resolved
wlei-llvm added a commit that referenced this pull request Aug 27, 2024
…profiles for given functions (#104654)

Currently in extended binary format, sample reader only read the
profiles when the function are in the current module at initialization
time, this extends the support to read the arbitrary profiles for given
input functions in later stage. It's used for
#101053.
5c4lar pushed a commit to 5c4lar/llvm-project that referenced this pull request Aug 29, 2024
…profiles for given functions (llvm#104654)

Currently in extended binary format, sample reader only read the
profiles when the function are in the current module at initialization
time, this extends the support to read the arbitrary profiles for given
input functions in later stage. It's used for
llvm#101053.
@wlei-llvm wlei-llvm merged commit 6e60330 into llvm:main Sep 4, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants