Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read tcp i/o timeout on Windows 10 #60

Closed
hexadecy opened this issue Nov 15, 2017 · 17 comments
Closed

read tcp i/o timeout on Windows 10 #60

hexadecy opened this issue Nov 15, 2017 · 17 comments
Assignees

Comments

@hexadecy
Copy link

After some db calls, we have to restart our Go backend:

Message: read tcp 127.0.0.1:50743->127.0.0.1:27017: i/o timeout

MongoDB 3.4.10
Go 1.9.2 windows/amd64
github.com/globalsign/mgo 5be15cc

I think it was ok with:
MongoDB 3.2.16
github.com/globalsign/mgo c4a7121
Go 1.9.1 Alpine 3.6

@domodwyer
Copy link

Hi @hexadecy - thanks for the report.

Do you have any more info?

  • Do all requests fail after a certain point?
  • What timeouts do you have configured?
  • Are you using bulk ops? What sort of workload are you running?
  • Do you see the issue with 5be15cc and Mongo 3.2 or only on 3.4?

I'll get a new test running in our environment ASAP.

Dom

@hexadecy
Copy link
Author

  • Yes all requests fail after the timeout, we use mgodb.Session.Clone() for all of our http handlers
  • Default timeouts, our project use nginx 1.12.2 as a proxy and labstack/echo v3 as go framework
  • No bulk ops, just single user workload on our dev environment
  • We only tested 3.4 so far with this commit 5be15cc

@domodwyer
Copy link

Thanks @hexadecy - I'll look into it ASAP.

@hexadecy
Copy link
Author

hexadecy commented Nov 16, 2017

That change looks suspicious to me:
5be15cc#diff-a75970bbdfa9237a3958512f8cc2fb09

Why have you reverted it than merged it?
We only have this issue on two Windows pc so far.

@domodwyer
Copy link

Hey @hexadecy

I've been running a test against one of our mongo environments (replicated & sharded) over the last 24 hours and I'm unable to reproduce the problem against 3.2.x. I also killed all connections at random throughout the night, and they recovered automatically.

I'll switch to 3.4 and try another run, but I don't think the change between 5be15cc and c4a7121 is likely to behave differently between the two versions - I might be wrong though!

For reference I'm using this load tool with 5be15cc to push mongo - does https://github.com/domodwyer/mpjbt/blob/1d8068910f0d1972947c6523e1093cc03ebb6343/mongo/db_opts.go#L29 look like your session.Clone() usage? Are you clients connecting directly to mongod or through mongos (or any firewalls/routers)?

Dom

@domodwyer
Copy link

It was reverted as it incorrectly targeted master (see #54) - we merge to development, test in staging and then merge to master - sorry for the confusion!

@weiishann
Copy link

Hi @hexadecy,

How are you using Nginx as a proxy? Could you elaborate please?

Thanks

@hexadecy
Copy link
Author

  • We use no sharding right now and it's direct localhost mongod
  • Except in Qa / Prod we use docker-compose network
  • Nginx is used only for the client, to mix Node.js and Go. They both talk to Mongo directly (localhost:27017).

The Go net.Conn implementation for Windows use "wsasend" and "wsarecv"
https://golang.org/src/net/fd_windows.go

But maybe they are not that thread safe:
http://www.serverframework.com/asynchronousevents/2015/01/wsarecv-wsasend-and-thread-safety.html

@hexadecy hexadecy changed the title read tcp i/o timeout read tcp i/o timeout on Windows 10 Nov 16, 2017
@domodwyer
Copy link

So it looks like it's probably a windows-specific issue - we're running tests against 3.4.x anyway just to eliminate it as a possible cause.

We're going to look into this - in the meantime I suggest vendoring 5be15cc if you're using windows.

Dom

@hexadecy
Copy link
Author

Ok, it happens again on the same Win pc.
We will try c4a7121 and mongo 3.4.10

From an old thread:
https://groups.google.com/forum/#!topic/golang-nuts/inm0Bu_FDk4

Socket unlock was always after as far as I know (2010)

_, err = socket.conn.Write(buf)
socket.unlock

@domodwyer
Copy link

Hi @hexadecy (and @idy via https://github.com/go-mgo/mgo/issues/502)

I don't believe it's related to #52 - the methods are documented as being concurrency safe:

Conn is a generic stream-oriented network connection.

Multiple goroutines may invoke methods on a Conn simultaneously.

My suspicion is on go itself at this point - @weiishann will be running a test over the weekend to see if we can reproduce on Win10.

Dom

@domodwyer domodwyer removed the bug label Nov 17, 2017
@weiishann
Copy link

weiishann commented Nov 22, 2017

Hi @hexadecy

I have ran some tests with similar setup to yours using this tool mpjbt. However, I could not reproduce your issue. Can you try and see if you can reproduce the same issue with the client running on a linux?

@hexadecy
Copy link
Author

hexadecy commented Nov 22, 2017

Ok but mpjbt use
conn := p.Session.Copy()
In my case we use p.Session.Clone()

He can reproduce a timeout with a wifi connection and session.Clone():
go-mgo#506

Ok I am running Ubuntu with 5be15cc and Mongo 3.2.17 + Go 1.9.1

We cannot reproduce it so far with c4a7121 and Mongo 3.4.10 + Go 1.9.2

@weiishann
Copy link

weiishann commented Nov 24, 2017

Hi @hexadecy,

It's a little difficult to determine the issue when we are changing multiple things between each tests. Could you provide this instead? Please use Go 1.9.2 for all the tests below.

Commit Mongo Version Platform Trigger Errors (Y/N)
5be15cc 3.2.17 Windows 10 (Y/N)
c4a7121 3.2.17 Ubuntu ? (Y/N)
5be15cc 3.4.10 Windows 10 (Y/N)
c4a7121 3.4.10 Ubuntu (Y/N)

Also, can you verify if you can reproduce with p.Session.Copy() please?

Thanks!

@hexadecy
Copy link
Author

Sorry but it's pretty hard to reproduce.

5be15cc 3.4.10 Windows 10

I modified https://github.com/hexadecy/mpjbt to use Session.Clone() and tried with different workloads.
Last time we saw the issue, the workload was extremely light probably just 1 or 2 db calls, the REST client was an iPhone 7 ios11.

@feliixx
Copy link

feliixx commented Nov 24, 2017

Hi @hexadecy

Have you tried to enable debug mode with mgo.SetDebug(true) and redirect the output to a logger ? If yes, could you post the logs when the error occurs?

@domodwyer
Copy link

Hi @hexadecy

We've been unable to reproduce the issue on our side and I'm not entirely sure under what conditions it occurs?

I'm going to close this ticket, if you're still having an issue feel free to reply with more info and we'll investigate.

Dom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants