Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply PGO to Binaries #6963

Closed
1 of 15 tasks
miniksa opened this issue Jul 17, 2020 · 2 comments · Fixed by #10071
Closed
1 of 15 tasks

Apply PGO to Binaries #6963

miniksa opened this issue Jul 17, 2020 · 2 comments · Fixed by #10071
Assignees
Labels
Area-Build Issues pertaining to the build system, CI, infrastructure, meta Area-Performance Performance-related issue Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Meta The product is the management of the products. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Milestone

Comments

@miniksa
Copy link
Member

miniksa commented Jul 17, 2020

PGO, or Profile Guided Optimization, is a way that we can accelerate the performance of the most frequently used paths in our applications. We profile the applications with several test scenarios that represent very strenuous or hot paths through the application. The instrumented binaries count up the usages of functions along paths while the test scenarios are run. Then when the applications are rebuilt, the binaries are provided to the linker to guide it as to which functions are the most important. See also: https://docs.microsoft.com/en-us/cpp/build/profile-guided-optimizations?view=vs-2019

During my prototyping phases in dev/miniksa/gotta_go_fast in July 2020, I found that we can get about 10-20% less runtime out of large text processing operations like time cat big.txt, a large list operation, or anything else sending a massive stream of "WriteFile" type operations to our system. Given that massive blocks of data transfer tend to be more noticeable as either a total runtime or a latency issue by our users (versus UI operations like splitting panes), I believe that we should use profiling runs that focus on massive data transfer and add other hot scenarios as necessary to help us squeeze the most performance out of our application.

This feature-size task represents setting up this system for our application.

The https://github.com/microsoft/microsoft-ui-xaml team has already done this for their entire stack. I had attempted to roll my own lesser variation of it, but on doing so, I realized that it will be better overall to just replicate their work against our application.

The following activities are what I imagine will be required to get PGO going:

  • Mimic targets/properties from Microsoft UI XAML related to build properties, compiler, and linker flags necessary to generate both a "Training" binary for generating profile data as well as teaching the final "Optimized Release" binary to consume that profile data during linking.
  • Set up test automation using the Helix framework/labs such that we can run Input Injection and UI Automation tests in a lab for repeatable "training scenarios" against the training binaries
    • BONUS: Also get the ol' UIA tests we have running in this lab
    • BONUS: Also get the TerminalApp tests that @zadjii-msft has had as "local only" forever running in this lab.
  • Mimic the YAML definitions from Microsoft UI XAML for dispatching these test runs and collecting data from the Helix lab
  • Convert our artifact storage to use a public NuGet feed as a place to store the large Profile Guided Databases (.PGD files) generated from the training run counts (.PGC files) - See Move to the TerminalDependencies NuGet feed #6954
  • Mimic the scripts from Microsoft UI XAML for uploading PGD artifacts to the public NuGet feed as well as the ones to select the most relevant PGD databases to use when compiling (nearest time to the current branch as not every commit SHA is profiled).
  • Write tests for our scenarios to run in the Helix lab
    • cat big.txt --> output massive amount of unformatted text
    • cat ls.txt --> output massive amount of colorized/formatted text
    • Fan favorite random cell drawing utilities like cacafire and cmatrix
    • Good ol' GIF to ASCII chafa
    • Search functionality through UIA tree that @carlos-zamora has had performance issues with when working with NVDA
    • Some sort of full stack launch test that can serve as both a canary and provide some weight for optimizing startup time
  • Pull it all together, weight and merge the profiles, and release profile optimized builds

If necessary, I have noticed that just profiling the conhost binary when in PTY mode can provide some performance boost without also profiling the WindowsTerminal binary (and all its Terminal* DLLs) as well. We could make incremental progress here, but it is definitely best if we can get end-to-end profile guided optimizations working.

@miniksa miniksa added Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Area-Performance Performance-related issue Area-Build Issues pertaining to the build system, CI, infrastructure, meta Product-Meta The product is the management of the products. labels Jul 17, 2020
@miniksa miniksa self-assigned this Jul 17, 2020
@ghost ghost added the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jul 17, 2020
@DHowett DHowett added this to the Terminal v2.0 milestone Jul 17, 2020
@DHowett DHowett removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jul 17, 2020
@DHowett
Copy link
Member

DHowett commented Jul 17, 2020

Triaged into Terminal 2.0

@miniksa miniksa mentioned this issue Aug 14, 2020
8 tasks
ghost pushed a commit that referenced this issue Aug 18, 2020
Use the Helix testing orchestration framework to run our Terminal LocalTests and Console Host UIA tests.

## References
#### Creates the following new issues:
- #7281 - re-enable local tests that were disabled to turn on Helix
- #7282 - re-enable UIA tests that were disabled to turn on Helix
- #7286 - investigate and implement appropriate compromise solution to how Skipped is handled by MUX Helix scripts

#### Consumes from:
- #7164 - The update to TAEF includes wttlog.dll. The WTT logs are what MUX's Helix scripts use to track the run state, convert to XUnit format, and notify both Helix and AzDO of what's going on.

#### Produces for:
- #671 - Making Terminal UIA tests is now possible
- #6963 - MUX's Helix scripts are already ready to capture PGO data on the Helix machines as certain tests run. Presuming we can author some reasonable scenarios, turning on the Helix environment gets us a good way toward automated PGO.

#### Related:
- #4490 - We lost the AzDO integration of our test data when I moved from the TAEF/VSTest adapter directly back to TE. Thanks to the WTTLog + Helix conversion scripts to XUnit + new upload phase, we have it back!

## PR Checklist
* [x] Closes #3838
* [x] I work here.
* [x] Literally adds tests.
* [ ] Should I update a testing doc in this repo?
* [x] Am core contributor. Hear me roar.
* [ ] Correct spell-checking the right way before merge.

## Detailed Description of the Pull Request / Additional comments
We have had two classes of tests that don't work in our usual build-machine testing environment:
1. Tests that require interactive UI automation or input injection (a.k.a. require a logged in user)
2. Tests that require the entire Windows Terminal to stand up (because our Xaml Islands dependency requires 1903 or later and the Windows Server instance for the build is based on 1809.)

The Helix testing environment solves both of these and is brought to us by our friends over in https://github.com/microsoft/microsoft-ui-xaml.

This PR takes a large portion of scripts and pipeline configuration steps from the Microsoft-UI-XAML repository and adjusts them for Terminal needs.
You can see the source of most of the files in either https://github.com/microsoft/microsoft-ui-xaml/tree/master/build/Helix or https://github.com/microsoft/microsoft-ui-xaml/tree/master/build/AzurePipelinesTemplates

Some of the modifications in the files include (but are not limited to) reasons like:
- Our test binaries are named differently than MUX's test binaries
- We don't need certain types of testing that MUX does.
- We use C++ and C# tests while MUX was using only C# tests (so the naming pattern and some of the parsing of those names is different e.g. :: separators in C++ and . separators in C#)
- Our pipeline phases work a bit differently than MUX and/or we need significantly fewer pieces to the testing matrix (like we don't test a wide variety of OS versions).

The build now runs in a few stages:
1. The usual build and run of unit tests/feature tests, packaging verification, and whatnot. This phase now also picks up and packs anything required for running tests in Helix into an artifact. (It also unifies the artifact name between the things Helix needs and the existing build outputs into the single `drop` artifact to make life a little easier.)
2. The Helix preparation build runs that picks up those artifacts, generates all the scripts required for Helix to understand the test modules/functions from our existing TAEF tests, packs it all up, and queues it on the Helix pool.
3. Helix generates a VM for our testing environment and runs all the TAEF tests that require it. The orchestrator at helix.dot.net watches over this and tracks the success/fail and progress of each module and function. The scripts from our MUX friends handle installing dependencies, making the system quiet for better reliability, detecting flaky tests and rerunning them, and coordinating all the log uploads (including for the subruns of tests that are re-run.)
4. A final build phase is run to look through the results with the Helix API and clean up the marking of tests that are flaky, link all the screenshots and console output logs into the AzDO tests panel, and other such niceities.

We are set to run Helix tests on the Feature test policy of only x64 for now. 

Additionally, because the set up of the Helix VMs takes so long, we are *NOT* running these in PR trigger right now as I believe we all very much value our 15ish minute PR turnaround (and the VM takes another 15 minutes to just get going for whatever reason.) For now, they will only run as a rolling build on master after PRs are merged. We should still know when there's an issue within about an hour of something merging and multiple PRs merging fast will be done on the rolling build as a batch run (not one per).

In addition to setting up the entire Helix testing pipeline for the tests that require it, I've preserved our classic way of running unit and feature tests (that don't require an elaborate environment) directly on the build machines. But with one bonus feature... They now use some of the scripts from MUX to transform their log data and report it to AzDO so it shows up beautifully in the build report. (We used to have this before I removed the MStest/VStest wrapper for performance reasons, but now we can have reporting AND performance!) See https://dev.azure.com/ms/terminal/_build/results?buildId=101654&view=ms.vss-test-web.build-test-results-tab for an example. 

I explored running all of the tests on Helix but.... the Helix setup time is long and the resources are more expensive. I felt it was better to preserve the "quick signal" by continuing to run these directly on the build machine (and skipping the more expensive/slow Helix setup if they fail.) It also works well with the split between PR builds not running Helix and the rolling build running Helix. PR builds will get a good chunk of tests for a quick turn around and the rolling build will finish the more thorough job a bit more slowly.

## Validation Steps Performed
- [x] Ran the updated pipelines with Pull Request configuration ensuring that Helix tests don't run in the usual CI
- [x] Ran with simulation of the rolling build to ensure that the tests now running in Helix will pass. All failures marked for follow on in reference issues.
@ghost ghost added the In-PR This issue has a related PR label May 10, 2021
@ghost ghost closed this as completed in #10071 May 13, 2021
@ghost ghost added Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed In-PR This issue has a related PR labels May 13, 2021
ghost pushed a commit that referenced this issue May 13, 2021
…st scenarios (#10071)

Implement PGO in pipelines for AMD64 architecture; supply training test scenarios

## References
- #3075 - Relevant to speed interests there and other linked issues.

## PR Checklist
* [x] Closes #6963
* [x] I work here.
* [x] New UIA Tests added and passed. Manual build runs also tested.

## Detailed Description of the Pull Request / Additional comments
- Creates a new pipeline run for creating instrumented binaries for Profile Guided Optimization (PGO).
- Creates a new suite of UIA tests on the full Windows Terminal app to run PGO training scenarios on instrumented binaries (and incidentally can be used to write other UIA tests later for the full Terminal app.)
- Creates a new NuGet artifact to store trained PGO databases (PGD files) at `Microsoft.Internal.Windows.Terminal.PGODatabase`
- Creates a new NuGet artifact to supply large-scale test content for automated tests at `Microsoft.Internal.Windows.Terminal.TestContent`
- Adjusts the release pipeline to run binaries in PGO optimized mode where content from PGO databases is leveraged at link time to optimize the final release build

The following binaries are trained:
- OpenConsole.exe
- WindowsTerminal.exe
- TerminalApp.dll
- TerminalConnection.dll
- Microsoft.Terminal.Control.dll
- Microsoft.Terminal.Remoting.dll
- Microsoft.Terminal.Settings.Editor.dll
- Microsoft.Terminal.Settings.Model.dll

In the future, adding `<PgoTarget>true</PgoTarget>` to a new `vcxproj` file will automatically enroll the DLL/EXE for PGO instrumentation and optimization going forward.

Two training test scenarios are implemented:
- Smoke test the Terminal by just opening it and typing a bit of text then exiting. (Should help focus on the standard launch path.)
- Optimize bulk text output by launching terminal, outputting `big.txt`, then exiting.

Additional scenarios can be contributed to the `WindowsTerminal_UIATests` project with the `[TestProperty("IsPGO", "true")]` annotation to add them to the suite of scenarios for PGO.

**NOTE:** There are currently no weights applied to the various test scenarios. We will revisit that in the future when/if necessary.

## Validation Steps Performed
- [x] - Training run completed at https://dev.azure.com/ms/terminal/_build?definitionId=492&_a=summary
- [x] - Optimization run completed locally (by forcing `PGOBuildMode` to `Optimize` on my local machine, manually retrieving the databases with NuGet, and building).
- [x] - Validated locally that x86 and ARM64 do not get trained and automatically skip optimization as databases are not present for them.
- [x] - Smoke tested optimized binary versus latest releases. `big.txt` output through CMD is ~11-12seconds prior to PGO and just over 8 seconds with PGO.
@ghost
Copy link

ghost commented May 25, 2021

🎉This issue was addressed in #10071, which has now been successfully released as Windows Terminal Preview v1.9.1445.0.:tada:

Handy links:

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Build Issues pertaining to the build system, CI, infrastructure, meta Area-Performance Performance-related issue Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Meta The product is the management of the products. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants