Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Pipe cannot connect #1354

Closed
kayakyakr opened this issue Nov 12, 2016 · 24 comments
Closed

Node Pipe cannot connect #1354

kayakyakr opened this issue Nov 12, 2016 · 24 comments
Assignees
Labels

Comments

@kayakyakr
Copy link

With the newest build (14965), I was hopeful that my issues with using node on WSL would have been cleared up. In this case, the error is being throw deep in the ember-cli build process, but it happens elsewhere as well.

I believe the issue to be twofold: one issue in the error lookup from libuv, the other with the pipe connector itself.

The error that is thrown:

util.js:1018
  var errname = uv.errname(err);
                   ^

Error: err >= 0
    at Object.exports._errnoException (util.js:1018:20)
    at exports._exceptionWithHostPort (util.js:1045:20)
    at PipeConnectWrap.afterConnect [as oncomplete] (net.js:1090:14)

My case is a full ember-cli-rails project that takes a bit of setup, but this seems to be throwing the same error: nodejs/node#9534

const {spawn} = require('child_process');

const sockets = spawn('go', ['run', 'sockets.go']);

I suspect that the crash is coming from an issue with libuv error lookup as the message err >= 0 is not very helpful. I am not certain what the original exception is because it doesn't get that far. The pipe code, so far as I can tell is here:
https://github.com/nodejs/node/blob/master/src/pipe_wrap.cc

@fpqc
Copy link

fpqc commented Nov 12, 2016

Not sure how easy it would be to do with node, but if you run strace -ff before the initialization command and redirect stderr to a file, the file will show if there is a failing syscall.

So it would be something like strace -ff somecommand &> ~/stracelog

@therealkenc
Copy link
Collaborator

Strace always helps. If anyone over on the node/libuv side can bisect (or just wild guess) what changed in 7.1 that would help even more.

@kayakyakr
Copy link
Author

Good call @fpqc found the call that threw the error (with gettime and futex calls removed):

[pid 19771] connect(12, {sa_family=AF_LOCAL, sun_path="/usr/local/var/run/watchman/kayakyakr-state/sock"}, 110) = 0
[pid 19771] socket(PF_LOCAL, SOCK_STREAM, 0) = 13
[pid 19771] ioctl(13, FIOCLEX)          = 0
[pid 19771] connect(13, {sa_family=AF_LOCAL, sun_path="/usr/local/var/run/watchman/kayakyakr-state/sock"}, 110) = 0
[pid 19771] epoll_ctl(5, EPOLL_CTL_ADD, 11, {EPOLLOUT, {u32=11, u64=11}}) = 0
[pid 19771] epoll_ctl(5, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLOUT, {u32=12, u64=12}}) = 0
[pid 19771] epoll_ctl(5, EPOLL_CTL_ADD, 13, {EPOLLIN|EPOLLOUT, {u32=13, u64=13}}) = 0
[pid 19771] epoll_wait(5, [{EPOLLOUT, {u32=13, u64=13}}, {EPOLLOUT, {u32=11, u64=11}}, {EPOLLOUT, {u32=12, u64=12}}, {EPOLLIN, {u32=8, u64=8}}], 1024, 481) = 4
[pid 19771] getsockopt(13, SOL_SOCKET, SO_ERROR, 0x7ffffd2ffe60, 0x7ffffd2ffe70) = -1 EINVAL (Invalid argument)

Could be related to #1343 which already has a fix inbound. That's good.

@therealkenc this issue is not limited to 7.1. I think a few more people started hitting it with 7.1, but it existed for me in all versions I tried.

@therealkenc
Copy link
Collaborator

therealkenc commented Nov 13, 2016

#1343 is about os.networkInterfaces(), aka PF_NETLINK/SOCK_RAW, aka the interminable #468. The trace you posted is getting a sockopt on a PF_LOCAL (aka AF_UNIX) socket.

getsockopt(..., SOL_SOCKET, SO_ERROR) was #682, which is supposed to be fixed as of 14926. It could either be (a) that fix didn't actually take or (b) that EINVAL isn't really the root of your particular problem here. If your guess that it's similar to node#9534 and is related to a failed spawn and pipes (not sockets), then it isn't going to be that EINVAL. [Or at least not directly.]

A test case that evokes the same libuv fail would help narrow. Spawning 'go' (if I'm reading that correctly), is problematic because golang has its own 'issues' at the moment.

@kayakyakr
Copy link
Author

Theoretically, you could spawn anything, according to nodejs/node#9534, as the issue is occurring on spawn_process, not specifically on what it's spawning.

The EINVAL happens directly before the error is thrown. It might not be from that call, but I don't see anything else it could be from: https://gist.github.com/kayakyakr/dee18f36f5fb4d91aeade8f06c82bc90

@fpqc
Copy link

fpqc commented Nov 13, 2016

@kayakyakr Eh, don't sweat it right now. The MS guys are doing weekend R&R and will probably ask for more info tomorrow if the strace wasn't enough.

I wonder if this is a case for getsockopt that got missed when it was fixed earlier (since fewer socket options could be set in the first place, they might not have tested getsockopt on some of the newer ones, and perhaps when the original fix was put in, it made assumptions about the sockopts that no longer hold.)

Anyway, the ideal for these kinds of bug reports is to try to come up with a simpler activity that will hit the same problems. That may not be necessary, since the devs might just come in tomorrow and say "Yeah, we know. Fixinbound", but think today maybe how you could write a simpler example in case they need your help tomorrow.

@kayakyakr
Copy link
Author

Node 8.0.0 (master) gives more information:

Error: connect Unknown system error -32767 /usr/local/var/run/watchman/kayakyakr-state/sock

I'm trying to get the ember-cli guys to help isolate this. It might be an issue with the sock or an issue with watchman itself.

@therealkenc
Copy link
Collaborator

therealkenc commented Nov 14, 2016

getsockopt(..., SOL_SOCKET, SO_ERROR, ...) still returns EINVAL as of 14965. Test case below is clean on Real Linux™.

/** so-error.cc **/

#include <sys/types.h> 
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, const char* argv[]) {
  const char* path = "./sock";
  unlink(path);
  struct sockaddr_un addr = {};
  addr.sun_family = AF_UNIX;
  strcpy(addr.sun_path, path);

  int lsd = socket(PF_LOCAL, SOCK_STREAM, 0);
  bind(lsd, reinterpret_cast<struct sockaddr*>(&addr), sizeof(addr));
  listen(lsd, 10);

  int cfd = socket(PF_LOCAL, SOCK_STREAM, 0);
  connect(cfd, reinterpret_cast<struct sockaddr*>(&addr), sizeof(addr));

  int err;
  socklen_t optlen = sizeof(int);
  if (getsockopt(cfd, SOL_SOCKET, SO_ERROR, &err, &optlen) < 0) {
    perror("getsockopt: ");
    _exit(-1);
  }

  return 0;
}

strace:

socket(PF_LOCAL, SOCK_STREAM, 0)        = 3
bind(3, {sa_family=AF_LOCAL, sun_path="./sock"}, 110) = 0
listen(3, 10)                           = 0
socket(PF_LOCAL, SOCK_STREAM, 0)        = 4
connect(4, {sa_family=AF_LOCAL, sun_path="./sock"}, 110) = 0
getsockopt(4, SOL_SOCKET, SO_ERROR, 0x7ffff1a0dab8, 0x7ffff1a0dabc) = 
    -1 EINVAL (Invalid argument)

@sunilmut
Copy link
Member

Thanks @therealkenc for detailed info. As you already found out, the support for SO_ERROR in WSL is not complete. I have opened a bug internally to track this.

@sunilmut sunilmut self-assigned this Nov 16, 2016
@kayakyakr
Copy link
Author

Any update on this issue?

@sunilmut
Copy link
Member

sunilmut commented Dec 8, 2016

This is a bug in our plate that we haven't gotten to yet. Should soon though.

@kayakyakr
Copy link
Author

Thanks. Sorry to pester, it's a hard block for my use case.

@sunilmut
Copy link
Member

sunilmut commented Dec 8, 2016

No problem. Will try to prioritize. thanks for the feedback and trying WSL!

@kayakyakr
Copy link
Author

No problem. Actually enjoying it. Once all the blockers are out of the way, was planning on getting a surface to use as my mobile dev setup. Was hoping that it'd be usable before ~Feb

(and I lied a bit: #1357 is the hard block. this is a blocker, but with a workaround)

@sunilmut
Copy link
Member

Just an update that Unix sockets now support the SO_ERROR socket option. The fix is in the dev branch and should soon make it to the release. Regarding #682, it's not clear from the post, but I think that issue was for the SO_ERROR socket option for INET sockets.

@therealkenc
Copy link
Collaborator

therealkenc commented Jan 13, 2017

Wow! That was ~~~6 hours~~ ~4 days from ping to fix (edit: still impressive).

Does squeaking really work in this forum? Because it's starting to look like it does. If so, finishing SCM_RIGHTS and SCM_CREDENTIALS (#1326 #514 #613) not to mention the clone() pattern GNU is newly fond of (#1005) or futexe()s in general (#1006) would unblock a ton 'o stuff. What's the optimal whinge frequency?

@fpqc
Copy link

fpqc commented Jan 13, 2017

@therealkenc I don't even see the ping.

@therealkenc
Copy link
Collaborator

I got the dates wrong. The ping was on the 8th. Had better shock value when I thought Sunil fixed it in an afternoon. Still, I was surprised when it worked for #610 and I didn't say anything at the time.

@fpqc
Copy link

fpqc commented Jan 13, 2017

@therealkenc Enjoy that laugh emoji. 🙃

@sunilmut
Copy link
Member

I am working on the SMC_RIGHTS and SCM_CREDENTIALS as we speak :)

@sunilmut
Copy link
Member

Just to be clear, both SCM_RIGHTS and SCM_CREDENTIALS are currently supported, but not at the same time. Also the current implementation of SCM_RIGHTS does not support passing more than one FD at a time. Very limited. I am working to remove all these limitations.

@therealkenc
Copy link
Collaborator

therealkenc commented Jan 13, 2017

Just giving you a hard time Sunil. 😀 Awesome stuff. And of course you've only made matters worse by fixing multiple SCM_RIGHTS fds right after I whinged about it.

@sunilmut
Copy link
Member

Thanks. Really appreciate all the support we have from this forum. And, credit goes to @benhillis for SO_ERROR for UNIX sockets.

@sunilmut
Copy link
Member

Support for SO_ERROR for unix sockets came in build 15019. Please try this with that build and let us know if the issue is resolved or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants