Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed ipc leak #277

Merged
merged 14 commits into from
Jun 12, 2018
Merged

fixed ipc leak #277

merged 14 commits into from
Jun 12, 2018

Conversation

debris
Copy link
Contributor

@debris debris commented Jun 6, 2018

closes #275
closes openethereum/parity-ethereum#8774
openethereum/parity-ethereum#8618

The leak existed, cause we were merging incoming message responses stream with outgoing messages. If either of them existed, we were not dropping a connection.

To fix the issue, we finish the merged stream as soon as one of the merged streams ends.

@debris
Copy link
Contributor Author

debris commented Jun 6, 2018

thanks @gnunicorn for debugging the issue!


self.flag = !self.flag;

match b.poll()? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that this whole match can be replaced with simple b.poll() call? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ 👍

@gnunicorn
Copy link

gnunicorn commented Jun 6, 2018

closes openethereum/parity-ethereum#8774
closes openethereum/parity-ethereum#8618

Just pointing out, that this doesn't come into effect until Paritiy's cargo.toml branch references are upgraded, too as they don't point to master (I suspect this ther is a branching whenever there is a release?).

@debris
Copy link
Contributor Author

debris commented Jun 6, 2018

@gnunicorn I know, we need to backport these fixes to that branch

@debris
Copy link
Contributor Author

debris commented Jun 6, 2018

hold on with merging this pr, tests are randomly failing. Most likely because server is now released before the end of the test

Copy link
Contributor

@dvdplm dvdplm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (and gnunicorns bug report was amazing!) but I get randomly failing tests in ipc: sometimes all is green, but most of the times there's at least one failure. Can you take a look at that?

@@ -172,7 +173,7 @@ impl<M: Metadata, S: Middleware<M>> ServerBuilder<M, S> {
})
})
.filter_map(|x| x)
.select(receiver.map_err(|e| {
.select_both(receiver.map_err(|e| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I for one would appreciate a comment as to why we're using select_both here – doesn't seem super-obvious! ;)

@debris
Copy link
Contributor Author

debris commented Jun 6, 2018

Io it seems like tests are failing, because every request uses it's own, new event loop. And order of the event loop initialization is not guaranteed.

So sometimes tests fail, because server hasn't started yet or started and was already dropped (because nothing has an ownership of the event loop)

@c0deright
Copy link

Hold your beer - it seems this doesn't fix openethereum/parity-ethereum#8618. Please see openethereum/parity-ethereum#8618 (comment)

@debris
Copy link
Contributor Author

debris commented Jun 6, 2018

🍺

Copy link
Contributor Author

@debris debris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipc server tests are just terrible, contain multiple bugs and they should be rewritten

@@ -323,7 +326,8 @@ mod tests {
result,
"{\"jsonrpc\":\"2.0\",\"result\":\"hello\",\"id\":1}",
"Response does not exactly match the expected response",
);
);
server.close();
Copy link
Contributor Author

@debris debris Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test issue no. 1

  • we were dropping the server before we even connected to it. everything was working only because of the leak

let _server = builder.start(path).expect("Server must run with no issues");
thread::sleep(::std::time::Duration::from_millis(50));
let server = builder.start(path).expect("Server must run with no issues");
thread::sleep(::std::time::Duration::from_millis(5000));
Copy link
Contributor Author

@debris debris Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test issue no. 2 (not solved)

  • server is not started synchronously, therefore we cannot assume that it's already available at the time of making the request. fixing this issue is actually quite complex, so I added only this workaround

type Error = S1::Error;

fn poll(&mut self) -> Poll<Option<S1::Item>, S1::Error> {
let (a, b) = if self.flag {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eating a virtual dispatch on every call to poll doesn't seem great to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a macro poll_inner!(a, b) would probably end up being less verbose in the end)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is just modified the version of select from futures library https://docs.rs/futures/0.1.21/src/futures/stream/select.rs.html#11-15

Async::NotReady => (),
};

self.flag = !self.flag;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this second negation of self.flag intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -332,13 +339,15 @@ mod tests {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test still occasionally fails here. Running for i in {1..20}; do cargo test; done it fails 2-3 times. Running it on by itself (for i in {1..20}; do cargo test req_parallel; done) makes things worse: about a fourth fails.

dvdplm
dvdplm previously approved these changes Jun 11, 2018
Copy link
Contributor

@dvdplm dvdplm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor grumbles.

EDIT I can still get req_parallel to fail on Rust stable (nightly seem fine). :(

::logger::init_log();
let path = "/tmp/test-ipc-45000";
let server = run(path);
thread::sleep(::std::time::Duration::from_millis(1000));
thread::sleep(Duration::from_millis(100));
let (stop_signal, stop_receiver) = mpsc::channel(400);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we parametrize the nr of threads/loops so we can do mpsc::channel(TEST_THRDS * TEST_ITERS);?

}
})
);
}
thread::sleep(Duration::from_millis(100));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: why do we still need to nap?

@dvdplm dvdplm dismissed their stale review June 11, 2018 12:52

Tests still failing on Rust stable

@debris
Copy link
Contributor Author

debris commented Jun 12, 2018

this issue is super annoying. it works on my machine always now. But it looks like the server is not yet ready, when the tests start

@dvdplm
Copy link
Contributor

dvdplm commented Jun 12, 2018

it works on my machine always now

I still get failures, but only using 1.26.1. No failures on nightly.

ipc/Cargo.toml Outdated
@@ -20,7 +20,7 @@ env_logger = "0.5"
lazy_static = "1.0"

[target.'cfg(not(windows))'.dev-dependencies]
tokio-uds = "0.1"
tokio-uds = "0.2"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the package helped me diagnose tests issues

.select(receiver.map_err(|e| {
// we use `select_with_weak` here, instead of `select`, to close the stream
// as soon as the ipc pipe is closed
.select_with_weak(receiver.map_err(|e| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintains the stream as long as the connection exists. the outgoing stream may be closed earlier

@@ -187,6 +189,7 @@ impl<M: Metadata, S: Middleware<M>> ServerBuilder<M, S> {

Ok(())
});
start_signal.send(Ok(())).expect("Cannot fail since receiver never dropped before receiving");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_signal was triggered before the server was fully set up. After fixing the order of initialisation, we no longer need thread::sleep in tests

@debris debris merged commit ab2c608 into master Jun 12, 2018
@debris debris deleted the fixed_ipc_leak branch June 12, 2018 16:03
debris added a commit that referenced this pull request Jun 12, 2018
* fixed ipc connection leak, closes #275

* fixed indentation

* fixed broken pipe issue in tests

* empirical tests fixes

* fix tests

* fix tests

* fix tests

* move ipc start_signal.send after the incoming.for_each

* log ipc traces on travis

* keep writer in memory as long as possible

* select_with_weak

* remove redundant thread::sleep

* test session end

* fixed race condition in test_session_end
debris added a commit that referenced this pull request Jun 12, 2018
* fixed ipc connection leak, closes #275

* fixed indentation

* fixed broken pipe issue in tests

* empirical tests fixes

* fix tests

* fix tests

* fix tests

* move ipc start_signal.send after the incoming.for_each

* log ipc traces on travis

* keep writer in memory as long as possible

* select_with_weak

* remove redundant thread::sleep

* test session end

* fixed race condition in test_session_end
debris added a commit that referenced this pull request Jun 18, 2018
* fixed ipc connection leak, closes #275

* fixed indentation

* fixed broken pipe issue in tests

* empirical tests fixes

* fix tests

* fix tests

* fix tests

* move ipc start_signal.send after the incoming.for_each

* log ipc traces on travis

* keep writer in memory as long as possible

* select_with_weak

* remove redundant thread::sleep

* test session end

* fixed race condition in test_session_end
ascjones pushed a commit that referenced this pull request Jun 18, 2018
* fixed ipc connection leak, closes #275

* fixed indentation

* fixed broken pipe issue in tests

* empirical tests fixes

* fix tests

* fix tests

* fix tests

* move ipc start_signal.send after the incoming.for_each

* log ipc traces on travis

* keep writer in memory as long as possible

* select_with_weak

* remove redundant thread::sleep

* test session end

* fixed race condition in test_session_end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants