-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed ipc leak #277
fixed ipc leak #277
Conversation
thanks @gnunicorn for debugging the issue! |
ipc/src/select_both.rs
Outdated
|
||
self.flag = !self.flag; | ||
|
||
match b.poll()? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right that this whole match can be replaced with simple b.poll()
call? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ 👍
Just pointing out, that this doesn't come into effect until Paritiy's cargo.toml branch references are upgraded, too as they don't point to master (I suspect this ther is a branching whenever there is a release?). |
@gnunicorn I know, we need to backport these fixes to that branch |
hold on with merging this pr, tests are randomly failing. Most likely because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good (and gnunicorns bug report was amazing!) but I get randomly failing tests in ipc
: sometimes all is green, but most of the times there's at least one failure. Can you take a look at that?
ipc/src/server.rs
Outdated
@@ -172,7 +173,7 @@ impl<M: Metadata, S: Middleware<M>> ServerBuilder<M, S> { | |||
}) | |||
}) | |||
.filter_map(|x| x) | |||
.select(receiver.map_err(|e| { | |||
.select_both(receiver.map_err(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I for one would appreciate a comment as to why we're using select_both
here – doesn't seem super-obvious! ;)
Io it seems like tests are failing, because every request uses it's own, new event loop. And order of the event loop initialization is not guaranteed. So sometimes tests fail, because server hasn't started yet or started and was already dropped (because nothing has an ownership of the event loop) |
Hold your beer - it seems this doesn't fix openethereum/parity-ethereum#8618. Please see openethereum/parity-ethereum#8618 (comment) |
🍺 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ipc server tests are just terrible, contain multiple bugs and they should be rewritten
ipc/src/server.rs
Outdated
@@ -323,7 +326,8 @@ mod tests { | |||
result, | |||
"{\"jsonrpc\":\"2.0\",\"result\":\"hello\",\"id\":1}", | |||
"Response does not exactly match the expected response", | |||
); | |||
); | |||
server.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test issue no. 1
- we were dropping the server before we even connected to it. everything was working only because of the leak
ipc/src/server.rs
Outdated
let _server = builder.start(path).expect("Server must run with no issues"); | ||
thread::sleep(::std::time::Duration::from_millis(50)); | ||
let server = builder.start(path).expect("Server must run with no issues"); | ||
thread::sleep(::std::time::Duration::from_millis(5000)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test issue no. 2 (not solved)
- server is not started synchronously, therefore we cannot assume that it's already available at the time of making the request. fixing this issue is actually quite complex, so I added only this workaround
ipc/src/select_both.rs
Outdated
type Error = S1::Error; | ||
|
||
fn poll(&mut self) -> Poll<Option<S1::Item>, S1::Error> { | ||
let (a, b) = if self.flag { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eating a virtual dispatch on every call to poll doesn't seem great to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a macro poll_inner!(a, b)
would probably end up being less verbose in the end)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is just modified the version of select
from futures library https://docs.rs/futures/0.1.21/src/futures/stream/select.rs.html#11-15
ipc/src/select_both.rs
Outdated
Async::NotReady => (), | ||
}; | ||
|
||
self.flag = !self.flag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this second negation of self.flag
intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -332,13 +339,15 @@ mod tests { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test still occasionally fails here. Running for i in {1..20}; do cargo test; done
it fails 2-3 times. Running it on by itself (for i in {1..20}; do cargo test req_parallel; done
) makes things worse: about a fourth fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor grumbles.
EDIT I can still get req_parallel
to fail on Rust stable (nightly seem fine). :(
ipc/src/server.rs
Outdated
::logger::init_log(); | ||
let path = "/tmp/test-ipc-45000"; | ||
let server = run(path); | ||
thread::sleep(::std::time::Duration::from_millis(1000)); | ||
thread::sleep(Duration::from_millis(100)); | ||
let (stop_signal, stop_receiver) = mpsc::channel(400); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we parametrize the nr of threads/loops so we can do mpsc::channel(TEST_THRDS * TEST_ITERS);
?
ipc/src/server.rs
Outdated
} | ||
}) | ||
); | ||
} | ||
thread::sleep(Duration::from_millis(100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: why do we still need to nap?
this issue is super annoying. it works on my machine always now. But it looks like the server is not yet ready, when the tests start |
I still get failures, but only using 1.26.1. No failures on nightly. |
ipc/Cargo.toml
Outdated
@@ -20,7 +20,7 @@ env_logger = "0.5" | |||
lazy_static = "1.0" | |||
|
|||
[target.'cfg(not(windows))'.dev-dependencies] | |||
tokio-uds = "0.1" | |||
tokio-uds = "0.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the package helped me diagnose tests issues
.select(receiver.map_err(|e| { | ||
// we use `select_with_weak` here, instead of `select`, to close the stream | ||
// as soon as the ipc pipe is closed | ||
.select_with_weak(receiver.map_err(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maintains the stream as long as the connection exists. the outgoing stream may be closed earlier
@@ -187,6 +189,7 @@ impl<M: Metadata, S: Middleware<M>> ServerBuilder<M, S> { | |||
|
|||
Ok(()) | |||
}); | |||
start_signal.send(Ok(())).expect("Cannot fail since receiver never dropped before receiving"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_signal
was triggered before the server was fully set up. After fixing the order of initialisation, we no longer need thread::sleep
in tests
* fixed ipc connection leak, closes #275 * fixed indentation * fixed broken pipe issue in tests * empirical tests fixes * fix tests * fix tests * fix tests * move ipc start_signal.send after the incoming.for_each * log ipc traces on travis * keep writer in memory as long as possible * select_with_weak * remove redundant thread::sleep * test session end * fixed race condition in test_session_end
* fixed ipc connection leak, closes #275 * fixed indentation * fixed broken pipe issue in tests * empirical tests fixes * fix tests * fix tests * fix tests * move ipc start_signal.send after the incoming.for_each * log ipc traces on travis * keep writer in memory as long as possible * select_with_weak * remove redundant thread::sleep * test session end * fixed race condition in test_session_end
* fixed ipc connection leak, closes #275 * fixed indentation * fixed broken pipe issue in tests * empirical tests fixes * fix tests * fix tests * fix tests * move ipc start_signal.send after the incoming.for_each * log ipc traces on travis * keep writer in memory as long as possible * select_with_weak * remove redundant thread::sleep * test session end * fixed race condition in test_session_end
* fixed ipc connection leak, closes #275 * fixed indentation * fixed broken pipe issue in tests * empirical tests fixes * fix tests * fix tests * fix tests * move ipc start_signal.send after the incoming.for_each * log ipc traces on travis * keep writer in memory as long as possible * select_with_weak * remove redundant thread::sleep * test session end * fixed race condition in test_session_end
closes #275
closes openethereum/parity-ethereum#8774
openethereum/parity-ethereum#8618The leak existed, cause we were merging incoming message responses stream with outgoing messages. If either of them existed, we were not dropping a connection.
To fix the issue, we finish the merged stream as soon as one of the merged streams ends.