Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-tls-root-certificates fails consistently on test-rackspace-win2012r2_vs2017-x64-3 #2254

Closed
sam-github opened this issue Mar 26, 2020 · 14 comments

Comments

@sam-github
Copy link
Contributor

  • Version: master
  • Platform: windows
  • Subsystem: tls

What steps will reproduce the bug?

Its failing consistently enough in CI it might be actually broken, not just flaky.

See:

Can anyone with access to Windows, @nodejs/platform-windows , bisect and see if one of the recent changes to tls root certs introduced this?

What do you see instead?



14:06:05 not ok 556 parallel/test-tls-root-certificates
14:06:05   ---
14:06:05   duration_ms: 0.375
14:06:05   severity: fail
14:06:05   exitcode: 1
14:06:05   stack: |-
14:06:05     assert.js:384
14:06:05         throw err;
14:06:05         ^
14:06:05     
14:06:05     AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
14:06:05     
14:06:05       assert(tls.rootCertificates.includes(extraCert))
14:06:05     
14:06:05         at Object.<anonymous> (C:\workspace\node-test-binary-windows-js-suites\node\test\parallel\test-tls-root-certificates.js:49:3)
14:06:05         at Module._compile (internal/modules/cjs/loader.js:1202:30)
14:06:05         at Object.Module._extensions..js (internal/modules/cjs/loader.js:1222:10)
14:06:05         at Module.load (internal/modules/cjs/loader.js:1051:32)
14:06:05         at Function.Module._load (internal/modules/cjs/loader.js:947:14)
14:06:05         at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
14:06:05         at internal/main/run_main_module.js:17:47 {
14:06:05       generatedMessage: true,
14:06:05       code: 'ERR_ASSERTION',
14:06:05       actual: false,
14:06:05       expected: true,
14:06:05       operator: '=='
14:06:05     }
14:06:05     assert.js:102
14:06:05       throw new AssertionError(obj);
14:06:05       ^
14:06:05     
14:06:05     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
14:06:05     
14:06:05     1 !== 0
14:06:05     
14:06:05         at ChildProcess.<anonymous> (C:\workspace\node-test-binary-windows-js-suites\node\test\parallel\test-tls-root-certificates.js:19:12)
14:06:05         at ChildProcess.<anonymous> (C:\workspace\node-test-binary-windows-js-suites\node\test\common\index.js:362:15)
14:06:05         at ChildProcess.emit (events.js:315:20)
14:06:05         at Process.ChildProcess._handle.onexit (internal/child_process.js:276:12) {
14:06:05       generatedMessage: true,
14:06:05       code: 'ERR_ASSERTION',
14:06:05       actual: 1,
14:06:05       expected: 0,
14:06:05       operator: 'strictEqual'
14:06:05     }
14:06:05   ...
@Trott
Copy link
Member

Trott commented Mar 27, 2020

@nodejs/testing @nodejs/platform-windows @nodejs/crypto

@Trott

This comment has been minimized.

@lpinca
Copy link
Member

lpinca commented Mar 27, 2020

I can't reproduce locally (Windows 10).

@Trott
Copy link
Member

Trott commented Mar 27, 2020

It looks like the problem may not be the test itself but the host. The pattern I'm seeing is that when the test runs on test-rackspace-win2012r2_vs2017-x64-3, it seems to fail, but when it runs on test-rackspace-win2012r2_vs2017-x64-1, it passes. I'm not sure why it almost always runs on test-rackspace-win2012r2_vs2017-x64-3 lately, though.

I'm going to test this hypothesis now by marking test-rackspace-win2012r2_vs2017-x64-3 offline and re-running the things that failed while bisecting.

(The end result of the bisect was that 9e3eddc75dde9df9356264b42bd30facb82583cd was identified as the bad commit and that doesn't seem likely at all, which is why I started looking at which host was failing.)

Here are the failures on test-rackspace-win2012r2_vs2017-x64-3 from the bisect and their re-runs with test-rackspace-win2012r2_vs2017-x64-3 offline:

388cef61e8a4859b7505f7b5cf988eba27ce17b4: https://ci.nodejs.org/job/node-test-commit-windows-fanned/34925/
Re-run: https://ci.nodejs.org/job/node-test-commit-windows-fanned/34933/

05f1df520064475a255d8956f9e1b6f4bf4c8543: https://ci.nodejs.org/job/node-test-commit-windows-fanned/34927/
Re-run: https://ci.nodejs.org/job/node-test-commit-windows-fanned/34934/

9e3eddc75dde9df9356264b42bd30facb82583cd: https://ci.nodejs.org/job/node-test-commit-windows-fanned/34928/
Re-run: https://ci.nodejs.org/job/node-test-commit-windows-fanned/34935/

@Trott
Copy link
Member

Trott commented Mar 27, 2020

(And for good measure, here's a Resume Build for the node-daily-master that ran a few hours ago and had the test fail on rackspace-win2012r2_vs2017-x64-3.

https://ci.nodejs.org/job/node-daily-master/1885/

@Trott
Copy link
Member

Trott commented Mar 27, 2020

3 of the 4 have finished and they've all passed. The fourth one is still compiling, but I'm feeling good enough about this to change the name of the issue to reflect this and transfer it to the Build repo.

@nodejs/build Maybe someone needs to re-provision test-rackspace-win2012r2_vs2017-x64-3?

@Trott Trott changed the title test-tls-root-certificates fails consistently on Windows test-tls-root-certificates fails consistently on test-rackspace-win2012r2_vs2017-x64-3 Mar 27, 2020
@Trott Trott transferred this issue from nodejs/node Mar 27, 2020
@Trott
Copy link
Member

Trott commented Mar 27, 2020

(And now it's passed on the fourth one as well.)

@sam-github
Copy link
Contributor Author

@Trott thanks for all the trouble-shooting. @joaocgreis any ideas?

I ran ansible-playbook --limit test-rackspace-win2012r2_vs2017-x64-3 ./playbooks/jenkins/worker/create.yml -C and it looks like I have the auth to re-ansible that host, and its likely the correct playbook because it did make some checks, but I'm not sure if that is risky. Then again, if the host is already not suitable for CI, maybe it can't get worse?

@joaocgreis
Copy link
Member

The issue was that test/fixtures/keys/ca1-cert.pem was changed to CRLF line endings. There is a .gitattributes file in that folder that should make it have LF line endings, but git did not notice the change in line endings.

This seems to happen when changing to a v10 branch, the .gitattributes file is not there. When checking out, git applies the autocrlf setting as it should, but when changing branches it does not. I'll have to take a look at the git issues to find if we should file an issue there or change something on our end. For now, I've added a workaround in the job config.

Test run: https://ci.nodejs.org/job/node-test-binary-windows-js-suites/2792/RUN_SUBSET=1,nodes=win2012r2-COMPILED_BY-vs2019-x86/ ✔️ (looks like Jenkins picked the node again for the same job as soon as it became available)

@Trott
Copy link
Member

Trott commented Mar 30, 2020

This seems to happen when changing to a v10 branch, the .gitattributes file is not there.

Wow! Not the root cause I was imagining. Thanks for the explanation!

@sam-github
Copy link
Contributor Author

Thanks for the troubleshooting!

Maybe it is possible that we rework the test to not be sensitive to line-endings? That would allow the tests to be more robust independently of job config.

I'll take a look, from a quick glance, it looks like a return s.startsWith('-----BEGIN CERTIFICATE-----\n'); might just need to be changed to a regex that allows a CRLF or LF as a line ending.

@joaocgreis
Copy link
Member

Maybe it is possible that we rework the test to not be sensitive to line-endings?

Sounds good. Normalizing line endings in imported certificates is also an option and would make it more consistent (the ones already there use LF), though this is probably semver-major and more disrupting.

For context, we use autocrlf in CI for two reasons: 1) it's the default on Windows, so most developers will probably have it set; 2) with this, CI runs Node.js mostly with js files with CRLF endings, but also with a few LF files from fixtures, giving us nice coverage of both line endings without explicit tests. Though, in this case, Git does not do what I expect and this is actually a CI issue.

@joyeecheung
Copy link
Member

I am getting an error repeatedly from the rm command in the jenkins config - it seems to me that the config in Jenkins should do rm -f to avoid failing when the file doesn't exist (why? I have no idea, but it does happen, and probably should not make the test fail...). See https://ci.nodejs.org/job/node-test-binary-windows-js-suites/26012/RUN_SUBSET=2,nodes=win2019-COMPILED_BY-vs2022/console

@joyeecheung
Copy link
Member

Oh it seems the actual error came from an incomplete checkout. Though it's probably still a good idea to rm -f or just fix the test as suggested in #2254 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants