Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All Metal3 Centos E2E main tests fail with #1685 #1785

Closed
tuminoid opened this issue Jun 14, 2024 · 3 comments · Fixed by #1786
Closed

All Metal3 Centos E2E main tests fail with #1685 #1785

tuminoid opened this issue Jun 14, 2024 · 3 comments · Fixed by #1786
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue is ready to be actively worked on.

Comments

@tuminoid
Copy link
Member

What steps did you take and what happened:
#1685 was merged, and since then all Metal3 Centos based e2e tests on main branch have failed. If the PR is reverted, they work.

What did you expect to happen:
Centos e2e succeeds.

Anything else you would like to add:
Ubuntu variants pass (given that #1780 is merged to fix one issue), so this is isolated to Centos.

Environment:
Dev-env / CI, e2e integration, e2e feature, e2e ephemeral, bml e2e periodics all fail.
All PR jobs with centos-e2e-integration-main fail

See https://jenkins.nordix.org/view/Metal3%20Periodic/job/metal3-periodic-centos-e2e-integration-test-main/87/ or any other periodic centos main job.

  • Baremetal Operator version: main
  • Environment (metal3-dev-env or other): dev-env / CI

/kind bug

@metal3-io-bot metal3-io-bot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Jun 14, 2024
@tuminoid
Copy link
Member Author

/triage accepted

/cc @dtantsur @elfosardo @MahnoorAsghar @mboukhalfa @Rozzii @kashifest
FYI

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Jun 14, 2024
@tuminoid
Copy link
Member Author

Notable difference in BMO logs is

"level":"info","ts":1718358988.3083067,"logger":"provisioner.ironic","msg":"error caught while checking endpoint, will retry","host":"metal3~node-0","endpoint":"https://172.22.0.2:6385/v1/","error":"Expected HTTP response code [200 300] when accessing [GET https://172.22.0.2:6385/v1/], but got 503 instead: <!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>503 Service Unavailable</title>\n</head><body>\n<h1>Service Unavailable</h1>\n<p>The server is temporarily unable to service your\nrequest due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n</body></html>"}
{"level":"info","ts":1718358988.3096807,"logger":"controllers.BareMetalHost","msg":"provisioner is not ready","baremetalhost":{"name":"node-0","namespace":"metal3"},"RequeueAfter:":30}
{"level":"info","ts":1718358988.3113363,"logger":"provisioner.ironic","msg":"error caught while checking endpoint, will retry","host":"metal3~node-1","endpoint":"https://172.22.0.2:6385/v1/","error":"Expected HTTP response code [200 300] when accessing [GET https://172.22.0.2:6385/v1/], but got 503 instead: <!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>503 Service Unavailable</title>\n</head><body>\n<h1>Service Unavailable</h1>\n<p>The server is temporarily unable to service your\nrequest due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n</body></html>"}

that occurs on main only, but not with patch reverted. Code path looks like its going to retry, but never recovers, only spams provisioner is not ready, while the reverted tests shows that after a while of provisioner is not ready it goes to next provisioner state.

@Rozzii
Copy link
Member

Rozzii commented Jun 14, 2024

I hope this will fix it or at least move us closer :
#1786

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants