Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of EOF Errors with GitHub API (Seems Resolved) #214

Closed
samoconnor opened this issue Mar 3, 2018 · 4 comments · Fixed by #236
Closed

Analysis of EOF Errors with GitHub API (Seems Resolved) #214

samoconnor opened this issue Mar 3, 2018 · 4 comments · Fixed by #236

Comments

@samoconnor
Copy link
Contributor

samoconnor commented Mar 3, 2018

Using verbose=3 and DEBUG_LEVEL=3.
Using the retry_post_requests branch (no keepalive) -- note: this branch (and the MbedTLS bug fix the goes with it) is now merged into master

Note that my POST request to GitHub fails with a 404, but this does not effect the test.
The test is trying to observer the behaviour what a POST request is sent on a connection that has been sitting idle for a while. The 404 result is not important.

First request to create a gist, get a response without issue.

julia> using GitHub ; token = "XXX" ; myauth = GitHub.authenticate(token)

DEBUG: 581e SSL connect: api.github.com:443...
DEBUG: 581e TCP connect: api.github.com:443...
DEBUG: 581e 🔗  New:0     0    api.github.com:443:63915 16 RawFD(23)
DEBUG: 581e 👁  Start write:T0  ⏸    0     0    api.github.com:443:63915 16 RawFD(23)
DEBUG: 581e 👁  Start read: T0  ⏸    0🔒    0    api.github.com:443:63915 16 RawFD(23)
DEBUG: e0ca 🗣  Write done: T0  ⏸    1     0    api.github.com:443:63915 16 RawFD(23)
DEBUG: 581e ✉️  Read done:  T0  ⏸    1     1    api.github.com:443:63915 16 RawFD(23)
GitHub.OAuth2(1e2b86**********************************)

julia> GitHub.create_gist(auth=myauth; params= Dict(["description" => "foo", "public" => false, "files" => Dict([ "f1" => "foo" ])]))
DEBUG: 581e ♻️  Idle:1     1    api.github.com:443:63915 16 RawFD(23)
DEBUG: 581e 👁  Start write:T1  ⏸    1     1    api.github.com:443:63915 16 RawFD(23)
➡️   "POST /gists HTTP/1.1\r\n"
➡️   "Authorization: token 1e2b86bXX32e7415f590c196b7\r\n"
➡️   "User-Agent: GitHub-jl\r\n"
➡️   "Host: api.github.com\r\n"
➡️   "Content-Length: 57\r\n"
➡️   "\r\n"
DEBUG: 581e 👁  Start read: T1  ⏸    1🔒    1    api.github.com:443:63915 16 RawFD(23)
➡️   "{\"public\":false,\"files\":{\"f1\":\"foo\"},\"description\":\"foo\"}"
DEBUG: b31b 🔒  POST non-idempotent, holding write lock: T1  🔁    1🔒    1🔒   api.github.com:443:63915 16 RawFD(23)
⬅️   "HTTP/1.1 404 Not Found\r\n"
    "Date: Sat, 03 Mar 2018 10:08:16 GMT\r\n"
 -- snip --

julia> HTTP.ConnectionPool.showpool(STDOUT)
ConnectionPool[
   ⏸    2     2    api.github.com:443:63915 16 RawFD(23)
]

^ connection pool shows a single connection that has been used twice (once to get the auth token and once for the POST).

After waiting 10 seconds, the request works as before...

julia> sleep(10)

julia> HTTP.ConnectionPool.showpool(STDOUT)
ConnectionPool[
   ⏸    2     2    api.github.com:443:63915 16 RawFD(23)
]

julia> GitHub.create_gist(auth=myauth; params= Dict(["description" => "foo", "public" => false, "files" => Dict([ "f1" => "foo" ])]))
DEBUG: 581e ♻️  Idle:2     2    api.github.com:443:63915 16 RawFD(23)
DEBUG: 581e 👁  Start write:T2  ⏸    2     2    api.github.com:443:63915 16 RawFD(23)
➡️   "POST /gists HTTP/1.1\r\n"
➡️   "Authorization: token 1e2b86bXX2e7415f590c196b7\r\n"
➡️   "User-Agent: GitHub-jl\r\n"
➡️   "Host: api.github.com\r\n"
➡️   "Content-Length: 57\r\n"
➡️   "\r\n"
DEBUG: 581e 👁  Start read: T2  ⏸    2🔒    2    api.github.com:443:63915 16 RawFD(23)
➡️   "{\"public\":false,\"files\":{\"f1\":\"foo\"},\"description\":\"foo\"}"
DEBUG: 06d5 🔒  POST non-idempotent, holding write lock: T2  🔁    2🔒    2🔒   api.github.com:443:63915 16 RawFD(23)
⬅️   "HTTP/1.1 404 Not Found\r\n"
    "Date: Sat, 03 Mar 2018 10:08:49 GMT\r\n"
    "Content-Type: application/json; charset=utf-8\r\n"
    -- snip --

After waiting 30 seconds, the request still works...

julia> sleep(30)

julia> HTTP.ConnectionPool.showpool(STDOUT)
ConnectionPool[
   ⏸    3     3    api.github.com:443:63915 16 RawFD(23)
]


julia> GitHub.create_gist(auth=myauth; params= Dict(["description" => "foo", "public" => false, "files" => Dict([ "f1" => "foo" ])]))
DEBUG: 581e ♻️  Idle:3     3    api.github.com:443:63915 16 RawFD(23)
DEBUG: 581e 👁  Start write:T3  ⏸    3     3    api.github.com:443:63915 16 RawFD(23)
➡️   "POST /gists HTTP/1.1\r\n"
➡️   "Authorization: token 1e2b86bd089b8XX7415f590c196b7\r\n"
➡️   "User-Agent: GitHub-jl\r\n"
➡️   "Host: api.github.com\r\n"
➡️   "Content-Length: 57\r\n"
➡️   "\r\n"
DEBUG: 581e 👁  Start read: T3  ⏸    3🔒    3    api.github.com:443:63915 16 RawFD(23)
➡️   "{\"public\":false,\"files\":{\"f1\":\"foo\"},\"description\":\"foo\"}"
DEBUG: 61da 🔒  POST non-idempotent, holding write lock: T3  🔁    3🔒    3🔒   api.github.com:443:63915 16 RawFD(23)
⬅️   "HTTP/1.1 404 Not Found\r\n"
    "Date: Sat, 03 Mar 2018 10:09:31 GMT\r\n"
    "Content-Type: application/json; charset=utf-8\r\n"
 -- snip --

julia> HTTP.ConnectionPool.showpool(STDOUT)
ConnectionPool[
   ⏸    4     4    api.github.com:443:63915 16 RawFD(23)
]

After waiting 60 seconds, the connection pool now show the connection in "connected" state rather than "paused" state but I don't think this really tells us anything other than that some local timeout has expired.

julia> sleep(60)

julia> HTTP.ConnectionPool.showpool(STDOUT)
ConnectionPool[
   🔗    4     4    api.github.com:443:63915 16 RawFD(23)
]

When the request is sent this time, it fails with an EOFError.
But, with the changes on the retry_post_requests branch, the request is now retried.
This means that even with the very tiny POST body used in the test, the connection failure is now detected before the body is sent. I assume that the failure is happening as soon as we try to read the response headers (which now happens before the body is sent).

julia> GitHub.create_gist(auth=myauth; params= Dict(["description" => "foo", "public" => false, "files" => Dict([ "f1" => "foo" ])]))
DEBUG: 581e ♻️  Idle:           🔗    4     4    api.github.com:443:63915 16 RawFD(23)
DEBUG: 581e 👁  Start write:T4  🔗    4     4    api.github.com:443:63915 16 RawFD(23)
➡️   "POST /gists HTTP/1.1\r\n"
➡️   "Authorization: token 1e2b86bd089b885fXX7415f590c196b7\r\n"
➡️   "User-Agent: GitHub-jl\r\n"
➡️   "Host: api.github.com\r\n"
➡️   "Content-Length: 57\r\n"
➡️   "\r\n"
DEBUG: 581e 👁  Start read: T4  🔗    4🔒    4    api.github.com:443:63915 16 RawFD(23)
➡️   "{\"public\":false,\"files\":{\"f1\":\"foo\"},\"description\":\"foo\"}"
DEBUG: 581e ❗️  ConnectionLayer EOFError(). Closing: T4  💀    4🔒    4🔒   api.github.com:443:63915 16 RawFD(-1)
DEBUG: 581e 🗣  Write done: T4  💀    5     4🔒   api.github.com:443:63915 16 RawFD(-1)
DEBUG: 581e ✉️  Read done:  T4  💀    5     5    api.github.com:443:63915 16 RawFD(-1)
DEBUG: 581e 🔄  Retry IOError(EOFError() during request(https://@api.github.com:/gists?#))
: POST /gists HTTP/1.1
DEBUG: 581e 🗑  Deleted:        💀    5     5    api.github.com:443:63915 16 RawFD(-1)
DEBUG: 581e SSL connect: api.github.com:443...
DEBUG: 581e TCP connect: api.github.com:443...
DEBUG: 581e 🔗  New:0     0    api.github.com:443:63931 16 RawFD(23)
DEBUG: 581e 👁  Start write:T0  ⏸    0     0    api.github.com:443:63931 16 RawFD(23)
➡️   "POST /gists HTTP/1.1\r\n"
➡️   "Authorization: token 1e2b86bd089XX2e7415f590c196b7\r\n"
➡️   "User-Agent: GitHub-jl\r\n"
➡️   "Host: api.github.com\r\n"
➡️   "Content-Length: 57\r\n"
➡️   "\r\n"
DEBUG: 581e 👁  Start read: T0  ⏸    0🔒    0    api.github.com:443:63931 16 RawFD(23)
➡️   "{\"public\":false,\"files\":{\"f1\":\"foo\"},\"description\":\"foo\"}"
DEBUG: 02b4 🔒  POST non-idempotent, holding write lock: T0  🔁    0🔒    0🔒   api.github.com:443:63931 16 RawFD(23)
⬅️   "HTTP/1.1 404 Not Found\r\n"
    "Date: Sat, 03 Mar 2018 10:10:53 GMT\r\n"
    "Content-Type: application/json; charset=utf-8\r\n"
  -- snip--

Now there is a new connection in the pool:

julia> HTTP.ConnectionPool.showpool(STDOUT)
ConnectionPool[
   ⏸    1     1    api.github.com:443:63931 16 RawFD(23)
]
@samoconnor
Copy link
Contributor Author

samoconnor commented Mar 3, 2018

Temporarily disabling the retry layer, and disabling the catch in the ConnectionPoolLayer ensures that the EOFError is unhandled and allows the stack trace to be seen uncorrupted. This confirms that: 1) the dead connection is not noticed when the request is sent (because all that base/stream.jl does is write into a buffer and ask libuv to send in the background); and 2) the dead connection is noticed as soon as readheaders is called (before the body is sent).

I hope that this means the Femtocleaner issue should be resolved without the need for keepalive.

I still think it would be a good idea to have an optional connection_timeout= so that people using an API like github, that has a timeout of about a minute, can set a 30 second timeout and know that they'll never have any extra latency due to retrying requests that fail on idle connections. I don't think an option like this should be enabled by default because the retry mechanism should transparently handle most use cases, and people may want low latency long-lasting connections between local server nodes, or between a local julia process and a local browser.

The Apache HTTP Client has a 30 seconds timeout: https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html#d5e418

➡️   "POST /gists HTTP/1.1\r\n"
➡️   "Authorization: token 1e2b86bd089b885XXX15f590c196b7\r\n"
➡️   "User-Agent: GitHub-jl\r\n"
➡️   "Host: api.github.com\r\n"
➡️   "Content-Length: 57\r\n"
➡️   "\r\n"
DEBUG: dfce 👁  Start read: T2  🔗    2🔒    2    api.github.com:443:64132 16 RawFD(23)
➡️   "{\"public\":false,\"files\":{\"f1\":\"foo\"},\"description\":\"foo\"}ERROR: "

EOFError: read end of file
Stacktrace:
 [1] readuntil(::HTTP.DebugRequest.IODebug{HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext}}, ::HTTP.Parsers.#find_end_of_header) at /Users/sam/.julia/v0.6/HTTP/src/IOExtras.jl:169
 [2] readheaders(::HTTP.DebugRequest.IODebug{HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext}}, ::HTTP.Messages.Response) at /Users/sam/.julia/v0.6/HTTP/src/Messages.jl:468
 [3] startread(::HTTP.Streams.Stream{HTTP.Messages.Response,HTTP.DebugRequest.IODebug{HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext}}}) at /Users/sam/.julia/v0.6/HTTP/src/Streams.jl:146
 [4] macro expansion at /Users/sam/.julia/v0.6/HTTP/src/StreamRequest.jl:57 [inlined]

@samoconnor samoconnor changed the title Analysis of EOF Errors with GitHub API Analysis of EOF Errors with GitHub API (Seems Resolved) Mar 3, 2018
@quinnj
Copy link
Member

quinnj commented Mar 4, 2018

This is really great analysis/research @samoconnor. Given all this, do you see outstanding action items? Could you make a PR to GitHub.jl introducing the timeout limits? and does that depend on #215 being merged first? It'd be great to get this all squared away and ensure FemtoCleaner.jl is happy again.

@samoconnor
Copy link
Contributor Author

I think this should wait until FemtoCleaner has been used for a while with the changes from the retry_post_requests branch (b24f49a) . If this is stable, then I don't think any further change is needed.

If tagging 0.6.6 makes it easier to test FemtoCleaner.jl, then lets do that.

I don't think #215 should be merged yet. And unless @vtjnash is happy with the connect timeout implementation I'll remove it before merging. As it is it causes the Julia runtime to hang on exit.

@samoconnor
Copy link
Contributor Author

Hi @Keno, Please let us know if v0.6.6 improves things with FemtoCleaner.

vtjnash pushed a commit that referenced this issue Apr 17, 2018
When a connection is returned to the (read) pool,
add a monitor to it for receiving unexpected data (or EOF),
and kill / close the Connection object if any activity occurs
before the next write (when it should have simply been waiting idle in the pool)

per JuliaLang/MbedTLS.jl#145 (comment)

closes #214
closes #199
closes #220
closes JuliaWeb/GitHub.jl#106
quinnj pushed a commit that referenced this issue May 31, 2018
* ConnectionPool: monitor idle connections

When a connection is returned to the (read) pool,
add a monitor to it for receiving unexpected data (or EOF),
and kill / close the Connection object if any activity occurs
before the next write (when it should have simply been waiting idle in the pool)

per JuliaLang/MbedTLS.jl#145 (comment)

closes #214
closes #199
closes #220
closes JuliaWeb/GitHub.jl#106

*  - Encapsulate read|writebusy/sequence/count logic in new isbusy function.
 - Move close() on eof() || !isbusy() to new monitor_idle_connection function.
 - Make monitor_idle_connection() a noop for ::Connection{SSLContext}

* require Julia 0.6.3 #236 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants