Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sequential/test-inspector-port-zero segfaults on macOS 10.10 and below #17175

Closed
gibfahn opened this issue Nov 21, 2017 · 13 comments
Closed

sequential/test-inspector-port-zero segfaults on macOS 10.10 and below #17175

gibfahn opened this issue Nov 21, 2017 · 13 comments
Labels
confirmed-bug Issues with confirmed bugs. inspector Issues and PRs related to the V8 inspector protocol libuv Issues and PRs related to the libuv dependency or the uv binding. test Issues and PRs related to the tests.

Comments

@gibfahn
Copy link
Member

gibfahn commented Nov 21, 2017

  • Version: 8.9.0.0
  • Platform: macOS <= 10.10.1
  • Subsystem: libuv, getaddrinfo

This test fails consistently on earlier versions of macOS (used code in #16685 to work out that it was a segfault).

Original test case failure:

$ tools/test.py sequential/test-inspector-port-zero
=== release test-inspector-port-zero ===                    
Path: sequential/test-inspector-port-zero
assert.js:42
  throw new errors.AssertionError({
  ^

AssertionError [ERR_ASSERTION]: exitCode: null, signal: SIGSEGV
    at ChildProcess.proc.on.mustCall (/build/jenkins/n8-test/ab673161/node/test/sequential/test-inspector-port-zero.js:37:59)
    at ChildProcess.<anonymous> (/build/jenkins/n8-test/ab673161/node/test/common/index.js:533:15)
    at emitTwo (events.js:126:13)
    at ChildProcess.emit (events.js:214:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:198:12)
Command: out/Release/node /build/jenkins/n8-test/ab673161/node/test/sequential/test-inspector-port-zero.js
[00:00|% 100|+   0|-   1]: Done 

Minimal reproduction:

Only one line of the test is failling:

test('--inspect=localhost:0');

Which means you can reproduce with:

node --inspect=localhost:0
# Outputs: Segmentation fault: 11 (core dumped)
@gibfahn gibfahn added confirmed-bug Issues with confirmed bugs. inspector Issues and PRs related to the V8 inspector protocol libuv Issues and PRs related to the libuv dependency or the uv binding. test Issues and PRs related to the tests. labels Nov 21, 2017
@gibfahn
Copy link
Member Author

gibfahn commented Nov 21, 2017

lldb thread list

(lldb) thread list
Process 0 stopped
* thread #1: tid = 0x0000, 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10, stop reason = signal SIGSTOP
  thread #2: tid = 0x0001, 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
  thread #3: tid = 0x0002, 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
  thread #4: tid = 0x0003, 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
  thread #5: tid = 0x0004, 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
  thread #6: tid = 0x0005, 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10, stop reason = signal SIGSTOP
  thread #7: tid = 0x0006, 0x00007fff8d1bad14 libsystem_info.dylib`mdns_addrinfo + 335, stop reason = signal SIGSTOP
(lldb) thread backtrace 1
* thread #1: tid = 0x0000, 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x0000000100b5f6c5 node`uv_sem_wait + 16
    frame #2: 0x0000000100a4222e node`node::inspector::InspectorIo::Start() + 48
    frame #3: 0x0000000100a402cb node`node::inspector::Agent::StartIoThread(bool) + 149
    frame #4: 0x0000000100a401a0 node`node::inspector::Agent::Start(node::NodePlatform*, char const*, node::DebugOptions const&) + 560
    frame #5: 0x00000001009c8d80 node`node::Start(v8::Isolate*, node::IsolateData*, int, char const* const*, int, char const* const*) + 305
    frame #6: 0x00000001009c8b91 node`node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) + 454
    frame #7: 0x00000001009c7fe1 node`node::Start(int, char**) + 469
    frame #8: 0x0000000100001a34 node`start + 52
(lldb) thread backtrace 6
  thread #6: tid = 0x0005, 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x0000000100b5f6c5 node`uv_sem_wait + 16
    frame #2: 0x0000000100a40cf0 node`node::inspector::(anonymous namespace)::StartIoThreadMain(void*) + 28
    frame #3: 0x00007fff97abb2fc libsystem_pthread.dylib`_pthread_body + 131
    frame #4: 0x00007fff97abb279 libsystem_pthread.dylib`_pthread_start + 176
    frame #5: 0x00007fff97ab94b1 libsystem_pthread.dylib`thread_start + 13
(lldb) thread backtrace 7
  thread #7: tid = 0x0006, 0x00007fff8d1bad14 libsystem_info.dylib`mdns_addrinfo + 335, stop reason = signal SIGSTOP
    frame #0: 0x00007fff8d1bad14 libsystem_info.dylib`mdns_addrinfo + 335
    frame #1: 0x00007fff8d1bab85 libsystem_info.dylib`search_addrinfo + 179
    frame #2: 0x00007fff8d1ba8da libsystem_info.dylib`si_addrinfo + 1395
    frame #3: 0x00007fff8d1ba2c3 libsystem_info.dylib`getaddrinfo + 179
    frame #4: 0x0000000100b58ee5 node`uv_getaddrinfo + 461
    frame #5: 0x0000000100a49613 node`node::inspector::InspectorSocketServer::Start() + 155
    frame #6: 0x0000000100a42c9a node`void node::inspector::InspectorIo::ThreadMain<node::inspector::InspectorSocketServer>() + 534
    frame #7: 0x00007fff97abb2fc libsystem_pthread.dylib`_pthread_body + 131
    frame #8: 0x00007fff97abb279 libsystem_pthread.dylib`_pthread_start + 176
    frame #9: 0x00007fff97ab94b1 libsystem_pthread.dylib`thread_start + 13

lldb process status

(lldb) process status
Process 0 stopped
* thread #1: tid = 0x0000, 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10
libsystem_kernel.dylib`semaphore_wait_trap:
->  0x7fff95e3456a <+10>: retq   
    0x7fff95e3456b <+11>: nop    

libsystem_kernel.dylib`semaphore_wait_signal_trap:
    0x7fff95e3456c <+0>:  movq   %rcx, %r10
    0x7fff95e3456f <+3>:  movl   $0x1000025, %eax
  thread #2: tid = 0x0001, 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10
libsystem_kernel.dylib`__psynch_cvwait:
->  0x7fff95e39132 <+10>: jae    0x7fff95e3913c            ; <+20>
    0x7fff95e39134 <+12>: movq   %rax, %rdi
    0x7fff95e39137 <+15>: jmp    0x7fff95e34ca3            ; cerror_nocancel
    0x7fff95e3913c <+20>: retq   
  thread #3: tid = 0x0002, 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10
libsystem_kernel.dylib`__psynch_cvwait:
->  0x7fff95e39132 <+10>: jae    0x7fff95e3913c            ; <+20>
    0x7fff95e39134 <+12>: movq   %rax, %rdi
    0x7fff95e39137 <+15>: jmp    0x7fff95e34ca3            ; cerror_nocancel
    0x7fff95e3913c <+20>: retq   
  thread #4: tid = 0x0003, 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10
libsystem_kernel.dylib`__psynch_cvwait:
->  0x7fff95e39132 <+10>: jae    0x7fff95e3913c            ; <+20>
    0x7fff95e39134 <+12>: movq   %rax, %rdi
    0x7fff95e39137 <+15>: jmp    0x7fff95e34ca3            ; cerror_nocancel
    0x7fff95e3913c <+20>: retq   
  thread #5: tid = 0x0004, 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff95e39132 libsystem_kernel.dylib`__psynch_cvwait + 10
libsystem_kernel.dylib`__psynch_cvwait:
->  0x7fff95e39132 <+10>: jae    0x7fff95e3913c            ; <+20>
    0x7fff95e39134 <+12>: movq   %rax, %rdi
    0x7fff95e39137 <+15>: jmp    0x7fff95e34ca3            ; cerror_nocancel
    0x7fff95e3913c <+20>: retq   
  thread #6: tid = 0x0005, 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10, stop reason = signal SIGSTOP
    frame #0: 0x00007fff95e3456a libsystem_kernel.dylib`semaphore_wait_trap + 10
libsystem_kernel.dylib`semaphore_wait_trap:
->  0x7fff95e3456a <+10>: retq   
    0x7fff95e3456b <+11>: nop    

libsystem_kernel.dylib`semaphore_wait_signal_trap:
    0x7fff95e3456c <+0>:  movq   %rcx, %r10
    0x7fff95e3456f <+3>:  movl   $0x1000025, %eax
  thread #7: tid = 0x0006, 0x00007fff8d1bad14 libsystem_info.dylib`mdns_addrinfo + 335, stop reason = signal SIGSTOP
    frame #0: 0x00007fff8d1bad14 libsystem_info.dylib`mdns_addrinfo + 335
libsystem_info.dylib`mdns_addrinfo:
->  0x7fff8d1bad14 <+335>: movw   (%rdx), %ax
    0x7fff8d1bad17 <+338>: movw   %ax, -0xd2(%rbp)
    0x7fff8d1bad1e <+345>: jmp    0x7fff8d1bad74            ; <+431>
    0x7fff8d1bad20 <+347>: movl   %r9d, -0xec(%rbp)

@gibfahn
Copy link
Member Author

gibfahn commented Nov 21, 2017

(lldb) thread select 7
* thread #7, stop reason = signal SIGSTOP
    frame #0: 0x00007fff8d1bad14 libsystem_info.dylib`mdns_addrinfo + 335
libsystem_info.dylib`mdns_addrinfo:
->  0x7fff8d1bad14 <+335>: movw   (%rdx), %ax
    0x7fff8d1bad17 <+338>: movw   %ax, -0xd2(%rbp)
    0x7fff8d1bad1e <+345>: jmp    0x7fff8d1bad74            ; <+431>
    0x7fff8d1bad20 <+347>: movl   %r9d, -0xec(%rbp)
(lldb) register read 
General Purpose Registers:
       rax = 0x0000000107000440
       rbx = 0x0000000000000000
       rcx = 0x0000000000001000
       rdx = 0x0000000000000000
       rdi = 0x00007fff7a8357c8  si_module_static_mdns.si
       rsi = 0x0000000102600152
       rbp = 0x00000001070004e0
       rsp = 0x00000001070003c0
        r8 = 0x0000000000000001
        r9 = 0x0000000000000001
       r10 = 0x0000000000000a58
       r11 = 0x00007fff8fe10c00  libsystem_platform.dylib`_platform_memchr$VARIANT$Haswell
       r12 = 0x0000000102600152
       r13 = 0x0000000000000000
       r14 = 0x0000000000000000
       r15 = 0x0000000000000001
       rip = 0x00007fff8d1bad14  libsystem_info.dylib`mdns_addrinfo + 335
    rflags = 0x0000000000010202
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

So SEGV happened on movw (%rdx), %ax, which moves the value at address %rdx to %ax. However %rdx isn't a valid address (it's 0x0).

Looks related to this Python bug: https://bugs.python.org/issue17269

@gibfahn
Copy link
Member Author

gibfahn commented Nov 21, 2017

This only fails on earlier versions of macOS

Testing machines:

macOS Version Pass/Fail
10.8.5 🔴
10.10.1 🔴
10.10.5 💚
10.11.4 💚
10.13.1 💚

Different node versions on 10.10.1

Version Pass/Fail
8.9.0 🔴
8.6.0 🔴
8.3.0 🔴
8.1.2 💚

8.3.0.0 process.versions

{ http_parser: '2.7.0',
  node: '8.3.0',
  v8: '6.0.286.52',
  uv: '1.13.1',
  zlib: '1.2.11',
  ares: '1.10.1-DEV',
  modules: '57',
  openssl: '1.0.2l',
  icu: '59.1',
  unicode: '9.0',
  cldr: '31.0.1',
  tz: '2017b' }

8.1.2.0 process.versions

{ http_parser: '2.7.0',
  node: '8.1.2',
  v8: '5.8.283.41',
  uv: '1.12.0',
  zlib: '1.2.11',
  ares: '1.10.1-DEV',
  modules: '57',
  openssl: '1.0.2l',
  icu: '59.1',
  unicode: '9.0',
  cldr: '31.0.1',
  tz: '2017b' }

@eugeneo
Copy link
Contributor

eugeneo commented Nov 21, 2017

Is there a way I could run the code on those bots?

@refack
Copy link
Contributor

refack commented Nov 21, 2017

I have an intuition that something with the threads is not safe (maybe access to the uv_loop_t)
That could also be the cause of #15558

@mhdawson
Copy link
Member

@eugeneo you can run this job on your branch. It is the job we are using to test out new machines for the CI and it includes the earlier versions: https://ci.nodejs.org/view/All/job/node-test-commit-osx-macstadium/

@mhdawson
Copy link
Member

And it shows the failure on 10.09 and 10.10.

@mhdawson
Copy link
Member

@gibfahn are you investigating this one ? I think its the last blocker before we could enable more of the osx machines. As I look at the current osx test backlog I'm looking forward to that.

@gibfahn
Copy link
Member Author

gibfahn commented Nov 22, 2017

As I look at the current osx test backlog I'm looking forward to that.

What patch version of 10.10 are you on? I'd assume we want to be on the latest, in which case this shouldn't block 10.10 going into CI.

I am looking at this, but no reason we shouldn't mark the test as flaky on macOS <10.10 in the meantime.

@mhdawson
Copy link
Member

@gdams I'm guessing we are on an older version of 10.10 as it would have started with what was installed from the cd. I wonder if we can add to our ansible script so that we upgrade as part of ansible config ?

@mhdawson
Copy link
Member

I can see from the UI on one of the machines that there is a pending update. I'll go ahead an let this be applied to validate it resolves the issue. We may want to see if we can configure updates to happen automatically through ansible (should we decide we want that).

@mhdawson
Copy link
Member

After update tests pass on 10.10

@gibfahn
Copy link
Member Author

gibfahn commented Dec 14, 2017

This can be closed, as we only support the latest version of macos10.10, which doesn't have this problem.

@gibfahn gibfahn closed this as completed Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. inspector Issues and PRs related to the V8 inspector protocol libuv Issues and PRs related to the libuv dependency or the uv binding. test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

4 participants