Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Clang PGO documentation #74022

Open
zamazan4ik opened this issue Dec 1, 2023 · 6 comments
Open

Rework Clang PGO documentation #74022

zamazan4ik opened this issue Dec 1, 2023 · 6 comments
Labels
documentation PGO Profile Guided Optimizations

Comments

@zamazan4ik
Copy link

zamazan4ik commented Dec 1, 2023

Here I want to propose several suggestions to the Clang PGO documentation.

We need to add information about LLVM runtime functions to the PGO documentation. Right now it does not cover at all provided by LLVM PGO-oriented functions like __llvm_profile_write_buffer and other functions from https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/InstrProfiling.h (and probably other files - I am not familiar much yet with this piece of LLVM). Knowing which functions are provided by the instrumentation runtime and how to use them is important for users since in many cases they want to implement manual logic with dumping PGO profiles. Right now only a small subset of these functions are covered in https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program - we need to move these descriptions to the main PGO documentation and extend it.

Here are some examples of existing usages:

We need to write LLVM profdata backward and forward compatibility guarantees directly in the Clang PGO documentation. Right now this information is located here - https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#format-compatibility-guarantees . Having this information in the "SourceBasedCodeCoverage" can be confusing for the users since at least the source-code coverage in terms of PGO refers only to FE PGO. But Clang also supports IR PGO (the recommended way to do PGO with Clang nowadays). As far as I understand, from the profile format point of view, the guarantees are the same - it should be documented as well.

Adding information about filesystem-less PGO case from https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#using-the-profiling-runtime-without-a-filesystem into the main PGO documentation also could be important for the PGO use-cases in the embedded environment.

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Dec 1, 2023
@EugeneZelenko EugeneZelenko added documentation PGO Profile Guided Optimizations and removed clang Clang issues not falling into any other category labels Dec 1, 2023
@ilovepi
Copy link
Contributor

ilovepi commented Dec 1, 2023

Thanks for the interest in improving our docs. Broadly, I think everyone is in favor of improving the quality of our documentation, and you're right that much of the Instrumentation and Runtime details could be documented better. However, I'm not certain I agree with you on many of these details.

While it's valuable to document how these things work for LLVM developers or folks who will need to implement the runtime themselves (e.g. for embedded or kernel usage), the top level documentation you're referring in Clang to is mostly for users of the compiler, not compiler developers. So mostly that documentation is focused on how a typical user of the compiler can use PGO to make their program faster. I don't think we want to merge the profiling and coverage details in user facing documentation (particularly in the Clang documentation), since those are internal details of how LLVM has implemented those features.

I do agree with you that the shared format and runtime details should be documented somewhere though. For instance, there should probably be a common rst file in llvm/docs that describes the common profile format and provides relevant details for people who need to implement their own runtime, and the shared compatibility guarantees. Where relevant the existing documentation can just link to the common bits.

Having said all that, I'm not pointing out my nits to discourage you. In fact, we'd all appreciate it if you could make a PR with the changes you'd like to see we can iron out the exact details in code review.

Off the top-of-my-head, I think @gulfemsavrun, @petrhosek, @ellishg @aeubanks @david-xl are likely to have slightly different opinions and have worked in the area a bit longer than me, so please take my initial thoughts with a grain of salt.

@ellishg
Copy link
Contributor

ellishg commented Dec 1, 2023

Right now it does not cover at all provided by LLVM PGO-oriented functions like __llvm_profile_write_buffer and other functions

__llvm_profile_write_buffer is documented in https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#using-the-profiling-runtime-without-a-filesystem

Although I do agree that "SourceBasedCodeCoverage" is not the best place for PGO documentation.

By the way, I think this discussion would be better on https://discourse.llvm.org/ rather than a GitHub issue.

@ilovepi
Copy link
Contributor

ilovepi commented Dec 1, 2023

By the way, I think this discussion would be better on https://discourse.llvm.org/ rather than a GitHub issue.

That is a good point.

@petrhosek
Copy link
Member

@david-xl is currently working on improving the documentation in #73845. I agree that we likely need another page separate from "SourceBasedCodeCoverage" for PGO.

@zamazan4ik
Copy link
Author

Just found another thing to improve in the PGO documentation.

In the Sampling PGO docs, the 2nd step says the following:

Note the use of the -b flag. This tells Perf to use the Last Branch Record (LBR) to record call chains. While this is not strictly required, it provides better call information, which improves the accuracy of the profile data.

Using "Last Branch Record (LBR)" is not strictly correct here since actually perf record -b is about Branch Stack Sampling (check the documentation). "Branch Stack Sampling" is a general name for the technology, and each CPU vendor has its name for it: for Intel CPUs it's called Last Branch Record, for AMD - Branch Sampling (BRS) (link), for ARM - Branch Record Buffer Extension (BRBE). Using LBR in this context is misleading since it implicitly says that only for Intel CPUs will be better results with perf record -b. As far as I know, it's not the case for LLVM - it should work fine with any Branch Stack Sampling implementation. So I recommend using Branch Stack Sampling instead in the Clang documentation.

@david-xl
Copy link
Contributor

Just found another thing to improve in the PGO documentation.

In the Sampling PGO docs, the 2nd step says the following:

Note the use of the -b flag. This tells Perf to use the Last Branch Record (LBR) to record call chains. While this is not strictly required, it provides better call information, which improves the accuracy of the profile data.

Using "Last Branch Record (LBR)" is not strictly correct here since actually perf record -b is about Branch Stack Sampling (check the documentation). "Branch Stack Sampling" is a general name for the technology, and each CPU vendor has its name for it: for Intel CPUs it's called Last Branch Record, for AMD - Branch Sampling (BRS) (link), for ARM - Branch Record Buffer Extension (BRBE). Using LBR in this context is misleading since it implicitly says that only for Intel CPUs will be better results with perf record -b. As far as I know, it's not the case for LLVM - it should work fine with any Branch Stack Sampling implementation. So I recommend using Branch Stack Sampling instead in the Clang documentation.

LBR is the de facto name for the branch sampling mechanism. Though different vendor may have different names for it, they are essentially following LBR spec. For AMD Zen4, the feature is 'LbrExtV2'. Note that vendor specific implementation may be different. For instance, AMD Zen4 captures taken branches on speculative wrong paths too, so some filtering is needed (done by the afdo profile creation tool).

Adding some clarification sounds fine to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation PGO Profile Guided Optimizations
Projects
None yet
Development

No branches or pull requests

6 participants