Move all of our pipelines to use the global-build-job to build CoreCLR, Mono, and Libraries assets #99179

jkoritzinsky · 2024-03-01T23:05:14Z

When we first consolidated dotnet/coreclr, dotnet/corefx, dotnet/core-setup, and mono/mono in .NET 5, we brought in separate product build, test build, and test run pipeline jobs, as before consolidation we were seeing increased build throughput and reliability by having separate jobs, ensuring we built each asset exactly once, and a complex matrix of how they interacted.

Over time, we've removed a few of these jobs and allowed double-building to reduce dependencies:

Libraries removed their separate "build test" job.
The CoreCLR/Mono "build test" job now builds the assets it needs directly instead of depending on other jobs.
The Libraries job now builds System.Private.CoreLib in-job (and now can even just build the ref-assembly for CoreLib)

Additionally, we've moved many jobs to build the product (and sometimes test) in one job using the "Global build" template:
All Mono devices jobs use the "global build" to build the product (and build runtime tests if needed)
All NativeAOT jobs use the "global build" template
Nearly all community-supported legs build using the "global build" template.
A few outerloop pipelines use the "global build" template for simplicity.

Today, our pipelines are split between using the "Global build" template that directly invokes the root build script to build the product in one command and the separate "build job" templates for building specific subsets (CoreCLR, Libraries, Mono, Installer) separately.

This has added significant complexity in many of our YAML scripts to support both cases and confusion about when to use one scenario or the other. For example, some jobs in the "runtime" pipeline use the global build job template to build the product and run tests whereas others don't.

Recently, we moved the official build to use the "global build" template for every job as a) we needed to move in this direction for the VMR and b) we believed that due to job queue time and upload/download time, we'd save on build execution time.

We saw decent improvements, as well as a significant reliability boost by merging these jobs together in the official build.

This PR takes us the next step: Merge all CoreCLR, Mono, and Libraries product build steps (and installer build steps when we aren't running tests) into single jobs instead of being split into 2 or 3 jobs (as all of the Mono runs also required some components of the corresponding CoreCLR build).

In addition, to avoid ending up in a split situation where many of our outerloop legs are still on the older system while PR and official builds are all on the new system, I have moved every pipeline to the "global build" model, not just the PR pipeline.

Job Dependency Graph Changes

CoreCLR + Libraries + Installer (w/o Installer Tests)

flowchart TD

subgraph Old Model
direction LR
CLR[CoreCLR Build Job]
Libs[Libraries Build Job]
Host[Installer Build Job]
CLR --> Host
Libs --> Host
BRT[Build Runtime Tests] --> RT
CLR --> RT[Runtime Tests]
Libs --> RT
CLR --> LT[Libraries Tests]
Libs --> LT
end
subgraph New Model
Global[Global build job: CLR + Libs + Host + Packs]
GBRT[Build Runtime Tests]
GRT[Runtime Tests]
GLT[Libraries Tests]
Global --> GRT
Global --> GLT
GBRT --> GRT
end

CoreCLR + Libraries + Installer w/ Tests

flowchart TD

subgraph Old Model
direction LR
CLR[CoreCLR Build Job]
Libs[Libraries Build Job]
Host[Installer Build and Test]
CLR --> Host
Libs --> Host
BRT[Build Runtime Tests] --> RT
CLR --> RT[Runtime Tests]
Libs --> RT
CLR --> LT[Libraries Tests]
Libs --> LT
end
subgraph New Model
Global[Global build job: CLR + Libs]
GI[Installer Build and Test]
GBRT[Build Runtime Tests]
GRT[Runtime Tests]
GLT[Libraries Tests]
Global --> GRT
Global --> GLT
GBRT --> GRT
Global --> GI
end

Mono Runtime Tests

flowchart TD

subgraph Old Model
direction LR
CLR[CoreCLR Build Job, all of CoreCLR]
Libs[Libraries Build Job]
Mono[Mono Build Job]
BRT[Build Runtime Tests] --> RT
CLR --> RT[Runtime Tests]
Libs --> RT
Mono --> RT
end
subgraph New Model
Global[Global build job: Mono + CoreRun + ilasm/dasm + Libs]
GBRT[Build Runtime Tests]
GRT[Runtime Tests]
Global --> GRT
GBRT --> GRT
end

YAML Template Flow Changes

In addition to the job simplifications above, there's also some YAML template simplification as well. In addition to removing the build-coreclr-and-libraries-job template, I consolidated the general flow of the JIT outerloop pipelines for CoreCLR tests into a single pipeline template, simplifying defining them.

Additionally, I moved all outerloop pipelines that run libraries tests to use the global build job with the "Send to Helix" extra step instead of shuffling files between jobs.

Q&A

Why do we still have separate jobs per vertical in the PR pipeline?
- We don't want to wait to run the runtime tests until the libraries tests pass or vice versa. Additionally, building the runtime tests still takes a while (~30 minutes), so we don't want to make that a requirement before any tests start to run until we've reduced that time.
Why are installer tests still in a separate job?
- Installer tests are very quick, but we don't want to block libraries and runtime tests from running if installer tests fail?
Why is installer build still in the installer test job?
- It's easier given current infrastructure to keep the "Installer Build and Test" job as-is rather than moving the building steps out entirely.

For validation, I'm happy to run any requested pipelines on this PR. I obviously can't run them all, but I'm happy to run a good spread to validate that I didn't break anything (I've been running various pipelines throughout working on this to validate the experience as I went).

…y properties from various AzDO scripts

…t our GCC leg uses the global build pipeline and our CI images automatically select the correct compiler)

…e and remove parameters that were only used in these scenarios

…llow variables here.

…obal build today are set in both places.

…gle-job

…(one product-build/test-build/test-run set of jobs) runtime test pipelines to use the global build approach.

…job + extra steps instead of using separate jobs.

…t. Unify all python usages to use the same python variable template (and all use venv as well)

…b based builds.

…tes with helix execution in-job

…-job templates

jkoritzinsky · 2024-03-14T17:52:55Z

@DrewScoggins the failures in the perf legs looks like restore failures in the benchmarks, not infrastructure issues. Can you take a look?

BruceForstall · 2024-03-14T22:10:39Z

I presume you are going to revert your change to jit.h once your testing is done? (It looks like that's causing a formatting job failure, also)

BruceForstall

It's basically impossible to review something this massive by sight, but I scanned through various changes, so LGTM.

jkoritzinsky · 2024-03-14T23:16:17Z

I presume you are going to revert your change to jit.h once your testing is done? (It looks like that's causing a formatting job failure, also)

Yep! Do you want me to keep the JIT-EE GUID change or should I revert that one before I merge this in?

BruceForstall · 2024-03-15T01:58:53Z

Do you want me to keep the JIT-EE GUID change or should I revert that one before I merge this in?

You should revert it. It was just to avoid trashing our collections when you were testing.

jkoritzinsky · 2024-03-15T20:35:46Z

/ba-g Timeouts in interpreter tests are known

agocke

LGTM I think. There might be more work to optimize things, but I think that can come later.

akoeplinger · 2024-03-16T09:39:02Z

@jkoritzinsky thank you so much for doing this work! 🎉

kg · 2024-03-16T12:10:08Z

The most recent commit in #99841 didn't run any lanes, is that potentially related to these yml changes?

akoeplinger · 2024-03-16T12:15:08Z

@kg it looks like it's running on AzDO but didn't post back to GitHub: https://dev.azure.com/dnceng-public/public/_build/results?buildId=605726&view=results

BruceForstall · 2024-03-16T16:22:05Z

@jkoritzinsky Looks like libraries-jitstress is broken:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=605612&view=results

 Done building Helix work items for scenario no_tiered_compilation. Work item count: 0
D:\a\_work\1\s\src\libraries\sendtohelixhelp.proj(335,5): error : No helix work items, or APKs, or AppBundles found to test
##[error]src\libraries\sendtohelixhelp.proj(335,5): error : No helix work items, or APKs, or AppBundles found to test

jkoritzinsky · 2024-03-16T17:57:27Z

@BruceForstall fix for that family of pipelines at #99868

BruceForstall · 2024-03-18T04:47:58Z

Looks like runtime-coreclr outerloop is broken:

https://dev.azure.com/dnceng-public/public/_build?definitionId=108&_a=summary

The 'stages' parameter is not a valid StageList

/eng/pipelines/common/xplat-setup.yml: Could not find /eng/pipelines/common/templates/global-build-job.yml in repository self hosted on https://github.com/ using commit 0935105e91450a1bad02b5b2f83be52bea2bcf59. GitHub reported the error, "Not Found"
 ```

jkoritzinsky · 2024-03-18T05:55:29Z

I've fixed that one as well in #99868 as I had other fixes I had to do there already.

This seems to be a side-effect of dotnet#99179 Arcade checks whether `disableComponentGovernance` is the empty string to apply the default skipping logic: https://github.com/dotnet/arcade/blob/ace00d8719b8d1fdfd0cc05f71bb9af216338d27/eng/common/templates/job/job.yml#L168-L174 Changed our templates to make sure we pass that correctly.

This seems to be a side-effect of #99179 Arcade checks whether `disableComponentGovernance` is the empty string to apply the default skipping logic: https://github.com/dotnet/arcade/blob/ace00d8719b8d1fdfd0cc05f71bb9af216338d27/eng/common/templates/job/job.yml#L168-L174 Changed our templates to make sure we pass that correctly.

BruceForstall · 2024-03-18T17:12:08Z

@jkoritzinsky The runtime-coreclr superpmi-asmdiffs-checked-release pipeline is broken:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=607047&view=logs&j=83516c17-6666-5250-abde-63983ce72a49&t=c10d5f44-55ce-55d7-7975-407ed75d9a96

D:\a\_work\1\s\venv\Scripts\python.exe D:\a\_work\1\s/src/coreclr/scripts/superpmi_asmdiffs_checked_release_setup.py -source_directory D:\a\_work\1\s -checked_directory D:\a\_work\1\s/artifacts/bin/coreclr/windows.x64.Checked -release_directory D:\a\_work\1\s/artifacts/bin/coreclr/windows.x64.Release -arch x64
========================== Starting Command Output ===========================
"C:\Windows\system32\cmd.exe" /D /E:ON /V:OFF /S /C "CALL "D:\a\_work\_temp\cf222c71-0741-4f77-b00c-67fcd11c9759.cmd""
checked_directory doesn't exist

This is a pipeline that depends on both the release and checked jitrollingbuild. I remember there was some special YML to do that; maybe it got lost?

dotnet#99179 made some changes and switched the template the workloads build was using. In the process, workloadArtifactsPath and workloadPackagesPath got dropped, which meant the workloads build did not pick up any manifests to process. As a result, the build failed.

#99179 made some changes and switched the template the workloads build was using. In the process, workloadArtifactsPath and workloadPackagesPath got dropped, which meant the workloads build did not pick up any manifests to process. As a result, the build failed.

jkoritzinsky added 30 commits February 22, 2024 14:24

Remove unused testBuildPlatforms property and some official-build-onl…

657bb22

…y properties from various AzDO scripts

Remove unused compilerArg parameter/variable (we don't use it now tha…

ca1de74

…t our GCC leg uses the global build pipeline and our CI images automatically select the correct compiler)

Move riscv and freebsd build jobs to use the global-build-job templat…

11b9929

…e and remove parameters that were only used in these scenarios

Refactor out some of the test variables into a variable template

39072d3

Move over clrinterpreter.yml pipeline as example.

d6c5357

Clean up some of the clrinterpreter changes.

fcda8f6

Update runtime-cet pipeline as it's a simple one and easy to validate.

aab46ee

Use name/value consistently

dfaa153

Revert change in runtime pipeline (we'll move this pipeline last).

3229aa6

Set variables in run-test-job directly

376aea3

Update job dependency

2a5feb6

Avoid manually specifying dependencies with variables, AzDO doesn't a…

83e77cc

…llow variables here.

Add back test build

0beb97a

Pass artifacts name as parameter, not variable

a49533f

Fix indentation so our post-build steps actually run

0049e9b

Move priority arg to variable template and fix variable template usage

bd2f6a6

Fix artifacts upload folder.

8657e90

Make sure both "non-uppercase build config" variable we use in the gl…

1a13131

…obal build today are set in both places.

Merge branch 'pr-single-job' of github.com:dotnet/runtime into pr-sin…

869c617

…gle-job

Fix argument order

532396f

Remove extraneous space

c9ca099

Convert various pipelines to use the global build design

9234db8

Create new jit outerloop pipeline template and move remaining simple …

137cfdf

…(one product-build/test-build/test-run set of jobs) runtime test pipelines to use the global build approach.

Convert the JIT exploratory pipeline to be based on the global build …

d2570a2

…job + extra steps instead of using separate jobs.

Fix build config.

0e4a90d

Convert all usages of built-jit-job.yml to use the global build scrip…

a980cac

…t. Unify all python usages to use the same python variable template (and all use venv as well)

Move runtime-extra-platforms-other.yml to exclusively global-build-jo…

cc647ae

…b based builds.

Convert crossgen2 outerloop job to use the global-build-job approach

f98d669

Convert libraries stress pipelines to use the global build job templa…

d032532

…tes with helix execution in-job

Convert runtime-llvm and runtimelab pipelines to use the global-build…

a6729e4

…-job templates

jkoritzinsky requested a review from BruceForstall March 14, 2024 22:03

BruceForstall approved these changes Mar 14, 2024

View reviewed changes

Undo jit change

994e7ce

Merge branch 'main' of github.com:dotnet/runtime into pr-single-job

c360f31

jkoritzinsky requested a review from DrewScoggins March 15, 2024 20:36

build-analysis bot mentioned this pull request Mar 15, 2024

System.Text.Json failing some large file tests #59678

Closed

agocke approved these changes Mar 15, 2024

View reviewed changes

jkoritzinsky merged commit 663839e into dotnet:main Mar 15, 2024
161 of 164 checks passed

jkoritzinsky deleted the pr-single-job branch March 15, 2024 23:51

akoeplinger mentioned this pull request Mar 18, 2024

Fix ComponentGovernance running on public PR/CI jobs #99898

Merged

amanasifkhalid mentioned this pull request Mar 18, 2024

JIT: Remove BBF_NONE_QUIRK #99907

Merged

steveisok mentioned this pull request Mar 18, 2024

Fix workloads build by bringing back some needed variables #99932

Merged

github-actions bot locked and limited conversation to collaborators Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move all of our pipelines to use the global-build-job to build CoreCLR, Mono, and Libraries assets #99179

Move all of our pipelines to use the global-build-job to build CoreCLR, Mono, and Libraries assets #99179

jkoritzinsky commented Mar 1, 2024 •

edited

Loading

jkoritzinsky commented Mar 14, 2024

BruceForstall commented Mar 14, 2024

BruceForstall left a comment

jkoritzinsky commented Mar 14, 2024

BruceForstall commented Mar 15, 2024

jkoritzinsky commented Mar 15, 2024

agocke left a comment

akoeplinger commented Mar 16, 2024

kg commented Mar 16, 2024

akoeplinger commented Mar 16, 2024

BruceForstall commented Mar 16, 2024

jkoritzinsky commented Mar 16, 2024

BruceForstall commented Mar 18, 2024

jkoritzinsky commented Mar 18, 2024

BruceForstall commented Mar 18, 2024

Move all of our pipelines to use the global-build-job to build CoreCLR, Mono, and Libraries assets #99179

Move all of our pipelines to use the global-build-job to build CoreCLR, Mono, and Libraries assets #99179

Conversation

jkoritzinsky commented Mar 1, 2024 • edited Loading

Job Dependency Graph Changes

CoreCLR + Libraries + Installer (w/o Installer Tests)

CoreCLR + Libraries + Installer w/ Tests

Mono Runtime Tests

YAML Template Flow Changes

Q&A

jkoritzinsky commented Mar 14, 2024

BruceForstall commented Mar 14, 2024

BruceForstall left a comment

Choose a reason for hiding this comment

jkoritzinsky commented Mar 14, 2024

BruceForstall commented Mar 15, 2024

jkoritzinsky commented Mar 15, 2024

agocke left a comment

Choose a reason for hiding this comment

akoeplinger commented Mar 16, 2024

kg commented Mar 16, 2024

akoeplinger commented Mar 16, 2024

BruceForstall commented Mar 16, 2024

jkoritzinsky commented Mar 16, 2024

BruceForstall commented Mar 18, 2024

jkoritzinsky commented Mar 18, 2024

BruceForstall commented Mar 18, 2024

jkoritzinsky commented Mar 1, 2024 •

edited

Loading