Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rev Alpine WithNode from 3.9 to 3.13 #57324

Merged
merged 3 commits into from
Sep 30, 2021
Merged

Conversation

am11
Copy link
Member

@am11 am11 commented Aug 12, 2021

Also,

  • update helix queues from 3.12 to 3.13
  • add 3.14 helix queues
  • update performance job to install cargo from main branch

Fixes #56672

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 12, 2021
@ghost
Copy link

ghost commented Aug 12, 2021

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

Updating to 3.14 now will buy us ~2 years before this latest version reaches EOL.
Upstream support for 3.14 was added: dotnet/dotnet-buildtools-prereqs-docker@d90babf

Fixes #56672

Author: am11
Assignees: -
Labels:

area-Infrastructure

Milestone: -

@am11
Copy link
Member Author

am11 commented Aug 12, 2021

cc @jkotas, @janvorli we can also update helix queues to test the latest Alpine Linux version, but I haven't changed that as part of this PR.

@jkotas
Copy link
Member

jkotas commented Aug 12, 2021

According to https://github.com/dotnet/core/blob/main/release-notes/6.0/supported-os.md#linux, we want to support Alpine 3.13+ for .NET 6. Are binaries built on Alpine 3.14 going to run on Alpine 3.13?

@am11
Copy link
Member Author

am11 commented Aug 12, 2021

Are binaries built on Alpine 3.14 going to run on Alpine 3.13?

They should work (as time64 changes were included in 3.13). I will test it.

@am11 am11 marked this pull request as draft August 12, 2021 22:28
@am11
Copy link
Member Author

am11 commented Aug 12, 2021

Locally I was able to build the main branch without any changes:

docker run -v $(pwd):/runtime \
  -it mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.14-WithNode-20210812132848-d90babf \
  /runtime/src/coreclr/build-runtime.sh checked x64

but on CI it was failing to find python while it is clearly there, but when i worked around it, it is now failing with Failed to get PGO data package path. There is nothing in logs pointing to clues what's wrong, so it might be something specific to pipeline YAML that needs updating?

@hoyosjs
Copy link
Member

hoyosjs commented Aug 13, 2021

@am11 looks like the container doesn't have curl? (and maybe doing the volume mount masks this as you may have a .dotnet directory?

@am11
Copy link
Member Author

am11 commented Aug 13, 2021

@am11 looks like the container doesn't have curl? (and maybe doing the volume mount masks this as you may have a .dotnet directory?

@hoyosjs, I tried it with clean git repo. Even without mounting, git clone <runtime url> followed by build-runtime.sh release x64 succeeds in that container.

@hoyosjs
Copy link
Member

hoyosjs commented Aug 13, 2021

I tried with copied IDs from the build logs and I was still able to get the SDK.

@am11
Copy link
Member Author

am11 commented Aug 14, 2021

Yes, the docker container mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.14-WithNode-20210812132848-d90babf is the one failing CI leg is using which I tested earlier. It has all the required stuff and capable of building runtime. Something else in AzDo is apparently going wrong. Nothing jumping out at me from logs of Initialize containers step, all looks normal until Build CoreCLR Runtime step, which seems to be using a different environment.

@ViktorHofer
Copy link
Member

@am11 please let us know if we can help anyhow.

@am11
Copy link
Member Author

am11 commented Aug 16, 2021

@ViktorHofer, it is still a mystery to me as CI error is not reproducible locally. e.g. this leg https://dev.azure.com/dnceng/public/_build/results?buildId=1291489&view=logs&jobId=0e64859c-a870-5a74-6b6e-333fc1003298&j=0e64859c-a870-5a74-6b6e-333fc1003298&t=5da9eef6-c013-592b-ccae-2e46c52786d0 is apparently using container mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.14-WithNode-20210812132848-d90babf and failing on build step due to missing curl. Although the said container has curl, python along rest of the dependencies.

I just (temporarily) deleted > /dev/null 2>&1 to reveal the missing curl error from

"$__RepoRootDir/eng/common/msbuild.sh" /clp:nosummary $__ArcadeScriptArgs $OptDataProjectFilePath $RestoreArg /t:DumpPgoDataPackagePath \
${__CommonMSBuildArgs} /p:PgoDataPackagePathOutputFile=${PgoDataPackagePathOutputFile} \
-bl:"$__LogsDir/PgoVersionRead_$__ConfigTriplet.binlog" > /dev/null 2>&1
local exit_code="$?"

If we test it locally with same command used by this failing CI leg:

docker run \
  -it mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.14-WithNode-20210812132848-d90babf \
  sh -c 'git clone https://github.com/dotnet/runtime --single-branch --depth 1 /runtime &&
  /runtime/src/coreclr/build-runtime.sh checked x64'

it builds coreclr successfully. Do you see anything unusual from the logs that could be causing Build CoreCLR Runtime step to use different environment than the container?

@ViktorHofer
Copy link
Member

I'll take a look. Let me also cc @safern and @jkoritzinsky

@ViktorHofer
Copy link
Member

@hoyosjs @jkoritzinsky any other ideas? I'm currently a bit short on time, maybe you have some additional ideas?

@ViktorHofer
Copy link
Member

@MattGal do you think you can help here?

@MattGal
Copy link
Member

MattGal commented Sep 1, 2021

@MattGal do you think you can help here?

No guarantees but sure, I'll take a look

@MattGal
Copy link
Member

MattGal commented Sep 1, 2021

@ViktorHofer I can't reproduce this running main build inside a mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.14-WithNode-20210812132848-d90babf container, it may be a transient error?

@hoyosjs
Copy link
Member

hoyosjs commented Sep 1, 2021

Sigh. Looks like transient - cache or something. Kicked off again and the containers initialized just fine... spoke too soon. Once more failed with the same error as three weeks back.

@MattGal
Copy link
Member

MattGal commented Sep 1, 2021

Super weird but does seem to be noticed and known externally: dotnet/install-scripts#206.

I'm running the same image you used to run the same CI build from a Windows box. I wonder if you started from an Ubuntu 18.04/20.04 if you could get a repro.

@hoyosjs
Copy link
Member

hoyosjs commented Sep 1, 2021

I couldn't repro on a physical Ubuntu box with that container on top.

@MattGal
Copy link
Member

MattGal commented Sep 1, 2021

I couldn't repro on a physical Ubuntu box with that container on top.

Maybe just try alpine-3.13-WithNode-20210812132854-ddfc481? since , even if this works after a bunch of retries you don't want to merge like this...

@am11
Copy link
Member Author

am11 commented Sep 1, 2021

even if this works after a bunch of retries

Flakiness is really not the problem. On other systems, you can build using this new container reliably and in AzDO it is always failing to run build step inside the container (it is apparently running build step on host system rather than the container). Logs are not indicating any problem.

@am11 am11 force-pushed the feature/ci/alpine3.14 branch 2 times, most recently from 37bf805 to 2888eaf Compare September 2, 2021 00:53
Also,
* update helix queues from 3.12 to 3.13
* add 3.14 helix queues
* update performance job to install cargo from main branch
@@ -41,15 +41,11 @@ jobs:

# Linux musl x64
- ${{ if eq(parameters.platform, 'Linux_musl_x64') }}:
- ${{ if eq(parameters.jobParameters.isFullMatrix, false) }}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test on 3.13 on fullMatrix == true? Meaning on rolling CI?

Copy link
Member Author

@am11 am11 Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, didn't mean to press delete on rolling ones 🤣

Copy link
Member

@safern safern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🎉

Copy link
Member

@hoyosjs hoyosjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @am11 :shipit:

@safern
Copy link
Member

safern commented Sep 24, 2021

It seems like you have a yaml syntax error 😢 : https://dev.azure.com/dnceng/public/_build/results?buildId=1384687&view=results

@aik-jahoda
Copy link
Contributor

There is a JSON failure on Alpine.314.Arm64.Open. @safern do you think the failure is related to the change or we can merge this PR?

@danmoseley
Copy link
Member

The JSON crash has a dump. We should look at that dump -- there is a how-to-debug-dump.md file there to help.

The regex issue is known and fixed already.

@am11
Copy link
Member Author

am11 commented Sep 29, 2021

There is a JSON failure on Alpine.314.Arm64.Open

Isn't it #57198?

@safern
Copy link
Member

safern commented Sep 29, 2021

I think we should at least look at the dump to make sure if it is: #57198 before merging something that could potentially introduce a failure to CI.

@am11
Copy link
Member Author

am11 commented Sep 29, 2021

The dump link on that issue is expired and Eirik had no success with clrstack when link was active (#57198 (comment)). Not sure how I can determine it.

Maybe rerun the job to see if the issue persist?

@safern
Copy link
Member

safern commented Sep 29, 2021

Maybe rerun the job to see if the issue persist?

Sounds good to me.

@am11
Copy link
Member Author

am11 commented Sep 29, 2021

Dump links from #57324 are still active, they're only hours old.

Yup, I can download dump from this PR but the one from #57198 (which I would like to compare with) is expired.

@danmoseley
Copy link
Member

Ah gotcha. Is there anything that can be gotten out of this dump?

@am11
Copy link
Member Author

am11 commented Sep 29, 2021

I am working on it. 🙂

@am11
Copy link
Member Author

am11 commented Sep 29, 2021

clrstack results are posted here: #57198 (comment). I have the container saved, in case more info is needed.

@aik-jahoda
Copy link
Contributor

The regex issue is known and fixed already.

clrstack results are posted here: #57198 (comment). I have the container saved, in case more info is needed.

Sounds like both test failures are known and tracked/fixed already. Can we merge this PR?

@hoyosjs hoyosjs merged commit a50e1e6 into dotnet:main Sep 30, 2021
@am11 am11 deleted the feature/ci/alpine3.14 branch September 30, 2021 10:30
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alpine builds use alpine-3.9 docker image that is EOL
8 participants