You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version
hyper 0.14.28
tokio 1.35.0
tokio-openssl 0.6.3
Description
Observing memory increase for "Incomplete Message error".
testing secario:
server: on http1 request, simply doing connection close.
server.go:
package main
import (
"crypto/tls"
"fmt"
"net/http"
"os"
)
func handler(w http.ResponseWriter, r *http.Request) {
// Hijack the connection to get access to the underlying TCP connection
hj, ok := w.(http.Hijacker)
if !ok {
http.Error(w, "Hijacking not supported", http.StatusInternalServerError)
return
}
conn, _, err := hj.Hijack()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
defer conn.Close()
// Close the connection immediately
conn.Close()
fmt.Println("Connection closed by server")
}
func main() {
http.HandleFunc("/", handler)
// Configure TLS with TLS 1.2
tlsConfig := &tls.Config{
MinVersion: tls.VersionTLS12,
}
server := &http.Server{
Addr: ":8080",
Handler: nil, // Default handler is used (Mux if not set)
TLSConfig: tlsConfig,
}
// Start the HTTPS server with the specified TLS configuration
err := server.ListenAndServeTLS("cert/server.crt", "cert/server.key")
if err != nil {
fmt.Println("Error starting server:", err)
os.Exit(1)
}
}
client: Just 1 request in loop untill success (infinite, as server never ACK): http-put "1MB" data to server.
configs:
- http1 only
- disable pooling for simplicity: max_idle per pool=0
driving_client.rs
to add...
Client is always getting connection reset by server. This is leading to two types of error response in hyper:
(A)hyper::Error(BodyWrite, Os { code: 104, kind: ConnectionReset, message: \"Connection reset by peer\" })) (B)hyper::Error(IncompleteMessage)
where (A) does not seems to cause memory leakage, while (B) seems to.
I observed that:
in high bandwidth scenario: (A) is dominating.
in low bandwidth scenario: (B) is dominating
When I run my tests (15mins):
In high bandwidth (around 5MBps),
memory was constant: no leakage.
Error A count: 5242
Error B count: 1
As per hyper logs: total 315MB flushed.
In low bandwidth scenario (100KBps),
memory kept increasing continuously linearly. Around 40MB of leakage observed.
Error A count: 0
Error B count: 5004
As per hyper logs: total 80MB flushed. The timeseries graph of data flushed is parallel to memory increase. Memory leak seems to be is 50% of whatever flushed. (just trying to connect random dots...)
see memory growth of case 1 vs case2:
It seems something is happening in hyper & not because of my-driving code, I confirmed it as: I changed error to connection failure by enabling SSL verification hyper::Error(Connect, SSL(Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 167772294, library: \"SSL routines\", function: \"tls_post_process_server_certificate\", reason: \"certificate verify failed\", file: \"../ssl/statem/statem_clnt.c\", line: 1883 }]))) })) and it kept retrying. memory was constant: no leakage. It seems such leakage only when some data flushing happens.
Note:
malloc_trim calls seems to clean the leakage to great extent.
This was I tried to reproduce locally on my laptop where for just 1MB request, in few minutes 100MB of leakage was observed (where graph is continuously increasing). I have seen similar scenario in production, where it lead to around 6GB of memory leakage, which is what the main concern is.
See in above graph, where the memory shooted to 6GB for our app (ideally our app consumes 1 to 1.4GB max at a time). On debugging, I found we received 157310 instances of 503 errors from S3 (rate-limit by them). our each request was around 28KB and when I plotted timeseries graph of error blocks size accumulated, the growth and curvature of graph was completely matching.
Note that sudden dip of 1GB in memory at 15:54:30 is due to malloc_trim call. Above run was on EC2, where malloc_trim cleared only 1GB out of leaked 5GB, while on local runs on ubunut laptop malloc_trim seems to clear all leakage due to error blocks.
I am still in process of understanding the hyper workflow. Pardon me, if any of below comments are wrong:
Overall observation is: when there is connection reset by server while some data is already flushed by hyper client, there is possibilty of memory leakage based on some racing condition of when/how hyper gets to know about conncetion closure.
like Error A, was received by hyper during poll_write, which make AsyncWrite call to n/w: tokio_openssl::SslStream:poll_write.
Where os informed about connection reset. This case does not seems to be causing memory/resource leak.
while Error (B) was received by hyper during poll_read, which makes AsyncRead call to n/w: tokio_openssl::SslStream:poll_read. Where in 0 bytes response indicates that connection closed while some message was expected. This case seems to be causing memory/resouce leak. It also errored on shutdown as: error shutting down IO: Transport endpoint is not connected (os error 107)
the buffered data on closed connection is causing leakage some where in some scenarios (incomplete message is one of them). I am still in process of figuring out how it cleans on closed connection.
As the errored blocks seems to get accumulated in memory and is unbounded, this issue can cause serious troubles. As an Application will keep consuming unbounded memory (in error scenarios) may suddenly crash.
The text was updated successfully, but these errors were encountered:
In case of hyper::Error(IncompleteMessage) , as was seeing shutdown-error as: error shutting down IO: Transport endpoint is not connected (os error 107). So I tried disabling shutdown, as below:
impl AsyncWrite for MyStream {
...
fn poll_shutdown(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), std::io::Error>> {
// let this = self.project();
// this.io.poll_shutdown(cx)
Poll::Ready(Ok(()))
}
}
This magically made memory graph horizontal, no leakage. No clue why this is happening.
Also, from your last comment, it seems like it has something to do with the io you're using. Is it possible that the implementation of io in your code is causing the leakage?
We are using hyper v0.14.26 and notice memory leak with a similar pattern in our services. Just want to do a +1 and check if there are any new updates related to this issue
Version
hyper 0.14.28
tokio 1.35.0
tokio-openssl 0.6.3
Description
Observing memory increase for "Incomplete Message error".
testing secario:
server.go:
configs:
- http1 only
- disable pooling for simplicity: max_idle per pool=0
driving_client.rs
Client is always getting connection reset by server. This is leading to two types of error response in hyper:
(A)
hyper::Error(BodyWrite, Os { code: 104, kind: ConnectionReset, message: \"Connection reset by peer\" }))
(B)
hyper::Error(IncompleteMessage)
where (A) does not seems to cause memory leakage, while (B) seems to.
I observed that:
When I run my tests (15mins):
In high bandwidth (around 5MBps),
In low bandwidth scenario (100KBps),
see memory growth of case 1 vs case2:
hyper::Error(Connect, SSL(Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 167772294, library: \"SSL routines\", function: \"tls_post_process_server_certificate\", reason: \"certificate verify failed\", file: \"../ssl/statem/statem_clnt.c\", line: 1883 }]))) }))
and it kept retrying. memory was constant: no leakage. It seems such leakage only when some data flushing happens.Note:
malloc_trim calls seems to clean the leakage to great extent.
This was I tried to reproduce locally on my laptop where for just 1MB request, in few minutes 100MB of leakage was observed (where graph is continuously increasing). I have seen similar scenario in production, where it lead to around 6GB of memory leakage, which is what the main concern is.
See in above graph, where the memory shooted to 6GB for our app (ideally our app consumes 1 to 1.4GB max at a time). On debugging, I found we received 157310 instances of 503 errors from S3 (rate-limit by them). our each request was around 28KB and when I plotted timeseries graph of error blocks size accumulated, the growth and curvature of graph was completely matching.
Note that sudden dip of 1GB in memory at 15:54:30 is due to
malloc_trim
call. Above run was on EC2, wheremalloc_trim
cleared only 1GB out of leaked 5GB, while on local runs on ubunut laptopmalloc_trim
seems to clear all leakage due to error blocks.Overall observation is: when there is connection reset by server while some data is already flushed by hyper client, there is possibilty of memory leakage based on some racing condition of when/how hyper gets to know about conncetion closure.
like Error A, was received by hyper during
poll_write
, which makeAsyncWrite
call to n/w: tokio_openssl::SslStream:poll_write.Where os informed about connection reset. This case does not seems to be causing memory/resource leak.
logs
poll_read
, which makesAsyncRead
call to n/w: tokio_openssl::SslStream:poll_read. Where in 0 bytes response indicates that connection closed while some message was expected. This case seems to be causing memory/resouce leak. It also errored on shutdown as:error shutting down IO: Transport endpoint is not connected (os error 107)
logs
the buffered data on closed connection is causing leakage some where in some scenarios (
incomplete message
is one of them). I am still in process of figuring out how it cleans on closed connection.As the errored blocks seems to get accumulated in memory and is unbounded, this issue can cause serious troubles. As an Application will keep consuming unbounded memory (in error scenarios) may suddenly crash.
The text was updated successfully, but these errors were encountered: