Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor OpenSockets to provide cross OS compatibility #180

Merged
merged 26 commits into from
Sep 10, 2020

Conversation

remgodow
Copy link
Contributor

@remgodow remgodow commented Sep 1, 2020

Here is a proposition of refactoring OpenSockets struct, and get_open_sockets method.

First, a limitation of Windows, which made me consider such an approach - I could not find a method to get UDP "connections" therefore I could not create corresponding Connection object in get_open_sockets() for windows.
Looking through the code, I have found that the vector of connections is used for dns resolving, but the network_utilization creates this vector as well, based on sniffed packets. If we use this info, we may remove connections from get_open_sockets(), leaving us with sockets_to_procs hashmap.
Given the nature of WinAPI, I decided to use sysinfo and netstat2 packages for cargo. They are cross os, allowing us to have common implementation of get_open_sockets() for all OS'es.

Please note that this pull is nowhere production ready, I would like to consult if it is an approach to pursuit (most of which is in the first commit).

Refers #15

@remgodow remgodow changed the title Refactor OpenSockets to provice cross OS compatibility Refactor OpenSockets to provide cross OS compatibility Sep 1, 2020
@imsnif imsnif self-assigned this Sep 1, 2020
@imsnif
Copy link
Owner

imsnif commented Sep 2, 2020

Hey @remgodow - I think it would be best to start with the premise about the OpenSockets.
Just so we're on the same page. The OpenSockets is:

pub struct OpenSockets {
    sockets_to_procs: HashMap<LocalSocket, String>,
    connections: Vec<Connection>,
}

LocalSocket and Connection are:

pub struct LocalSocket {
    pub ip: IpAddr,
    pub port: u16,
    pub protocol: Protocol,
}

pub struct Connection {
    pub remote_socket: Socket,
    pub local_socket: LocalSocket,
}

The connections part of OpenSockets is used for dns resolution indeed, but the sockets_to_procs part is used to classify the local part of the tcp/udp connection to a process name (eg. 10.0.0.8:11332(tcp) is used by firefox). Unfortunately we can't get that information from sniffing packets, because they are not aware of the processes.

I don't know a lot about the winapi unfortunately, but do you think there may be some way to find out which process "holds" which local socket? If we have that information, we're good.

@remgodow
Copy link
Contributor Author

remgodow commented Sep 2, 2020

Using netstat2 package it is possible to get:

  • TCP connections (local/remote socket) with PID
  • UDP local sockets with PID - so no Connection object for UDP

Then I use sysinfo package to get process info and translate PID to process name.
Therefore there is no issue getting sockets_to_procs for Windows (and other systems, as the mentitioned packages are cross platform). That solves the Windows implementation.

However given the crossplatform nature of this implementation it is possible to simplify linux and mac implementations, as we could use this one instead of per os get_open_sockets(). And that is how I proposed in this draft.
common get_open_sockets()

But what about the connections part of OpenSockets? We can get it from network_utilization, giving us possibility to remove it from OpenSockets and to leave libpnet responsible for it.
connections from network_utilization

@imsnif
Copy link
Owner

imsnif commented Sep 2, 2020

Oh damn, now I get you. I should have looked at the code. :)

Yes, that sounds good. So (just to make sure I understand) the only functional difference would essentially be that we're doing the reverse dns resolution according to packets we get on the stream rather than the list of local sockets, right? If so, that doesn't sound bad at all.

When I was developing bandwhich originally, I used the netstat crate, but it was yanked mid development. So I'm happy to be able to use its successor. I like the idea of deferring the cross platform specifics to crates. Let's do it.

@remgodow
Copy link
Contributor Author

remgodow commented Sep 2, 2020

So (just to make sure I understand) the only functional difference would essentially be that we're doing the reverse dns resolution according to packets we get on the stream rather than the list of local sockets, right?

Exactly, it would require another approach to unittesting dns resolution, but it simplifies a lot in exchange.

@remgodow
Copy link
Contributor Author

remgodow commented Sep 2, 2020

@imsnif I cleaned up the refactorings and added windows build automation. Travis builds fail due to 3 tests related to DNS. I'm not sure if those are relevant after switching to dns based on sniffed packets. How would you suggest to approach this issue?

@imsnif
Copy link
Owner

imsnif commented Sep 3, 2020

Hey @remgodow - looks like this happens because we now resolve IPs a little later down the line. So the first snapshots are created without the resolution.

I played around with this a little: in some of the tests this is not an issue (the later snapshots include updates to the host names) and we just need to accept the new snapshots with cargo insta review.
In other tests, we need to add some more sleep events in order to create a few more snapshots to give the hostnames a chance to appear.

Let me know if you'd like to do this, or if you'd like me to update these and push to your branch: as you prefer.

@remgodow
Copy link
Contributor Author

remgodow commented Sep 3, 2020

It would be nice if you could take care of tests, I'm not yet familiar with snapshots concept in Rust.

@imsnif
Copy link
Owner

imsnif commented Sep 3, 2020

Sure, I'll do it later this evening. Would you mind if I also merge from main? It has the other PR (and some other snapshot fixes - will be nice not to have to merge them :) ).

@remgodow
Copy link
Contributor Author

remgodow commented Sep 3, 2020

Sure thing, thanks.

@imsnif
Copy link
Owner

imsnif commented Sep 3, 2020

Done.

So, I:

  1. Merged main.
  2. Made the adjustments to the snapshots (in the end I didn't have to change the tests, the snapshots are doing the right thing only a little later, it's fine).
  3. Fixed some styling issues that clippy was complaining about - so it will be green.

Let me know if I can give a hand with anything else.

@remgodow remgodow marked this pull request as ready for review September 4, 2020 16:21
@@ -182,7 +191,7 @@ pub fn get_input(

let network_frames = network_interfaces
.iter()
.filter(|iface| iface.is_up() && !iface.ips.is_empty())
.filter(|iface| !iface.ips.is_empty())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regression, but I had to do that as on Windows libpnet does not read interface flags which causes iface.is_up() to fail.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a bug in libpnet? Are the interface flags there, or is it not a thing in windows?
For now, can we hide this behind conditional compilation? Have the old way in not(windows) and this one in windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, its a TODO in libpnet, I guess conditional is a way to go for now.

         all_ifaces.push(NetworkInterface {
                name: name_str,
                index: (*cursor).Index,
                mac: Some(mac),
                ips: ips,
                // flags: (*cursor).Type, // FIXME [windows]
                flags: 0,
            });


[dev-dependencies]
insta = "0.12.0"
pnet_base = "0.26.0"
cargo-insta = "0.11.0"
packet-builder = "0.5.0"
regex = "1"

[build-dependencies]
#[target.'cfg(target_os="windows")'.build-dependencies]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct, those crates should not be downloaded on unixes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is correct. They were not downloaded locally on my linux.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imsnif actually that's not the case: I had to uncomment this line to make sure that http_req doesn't get downloaded.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oi, I'm sorry about that! You are very right and I managed to miss this. :) I'm fixing this now and will push a patch. Thanks bringing this up (and catching it so quickly)!

@@ -82,6 +82,15 @@ fn sigwinch() -> (Box<OnSigWinch>, Box<SigCleanup>) {
(Box::new(on_winch), Box::new(cleanup))
}

#[cfg(any(target_os = "windows"))]
fn sigwinch() -> (Box<OnSigWinch>, Box<SigCleanup>) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to return sigwinch stub on Windows?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, are we not handling window size changes on windows? Or did I miss it?
If we're not doing this, isn't the experience a little wonky when you change the terminal window size? Something like: change size, see a broken interface which gets fixed after ~1 second?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows does not provide SIGWINCH, and its not that easy to get the console resize event. https://stackoverflow.com/questions/10856926/sigwinch-equivalent-on-windows

I've just checked and yes,

Something like: change size, see a broken interface which gets fixed after ~1 second?

that is how it behaves, however I noticed that only after you described that.
I think it is a cosmetical issue, and we could leave that for later.

@remgodow
Copy link
Contributor Author

remgodow commented Sep 4, 2020

Thanks, I believe it is nearly ready for merge. After that, there would be infra and packaging to take care of. I wonder if we could reuse Github actions from ripgrep.

@imsnif
Copy link
Owner

imsnif commented Sep 5, 2020

Hey @remgodow - just a heads up, I'll be away for most of the weekend so will take a look on Monday. Have a good one :)

Copy link
Owner

@imsnif imsnif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @remgodow, great work! I left some comments in the code.

Other than that, I tested this a little on my machine, and I have some concerns about sysinfo - which makes me have some concerns about netstat2 as well. :)

What I found: It seems to me like sysinfo is doing something wrong with process names. I tested this with a node app, specifically webtorrent-cli (https://github.com/webtorrent/webtorrent-cli). With this change, the process name is shown as "node" whereas with the version in main at the moment, it shows properly as "WebTorrent" (which is also what we get from cat /proc/<pid>/status).

I'm thinking there might be a few more of these, some of them not very trivial to find. I'm kind of leaning at the moment to add this behaviour to windows and keep the old linux/mac way of doing things. We can hide the dependencies behind conditional compilation in order not to bloat up the execuable. That way we also won't have to do any manual testing for those platforms. What do you think?

I know I previously indicated that deferring the platform specific behaviour to the crates would be my preference, but on second thought I'm afraid of unknowingly breaking more things that worked correctly before.

@@ -82,6 +82,15 @@ fn sigwinch() -> (Box<OnSigWinch>, Box<SigCleanup>) {
(Box::new(on_winch), Box::new(cleanup))
}

#[cfg(any(target_os = "windows"))]
fn sigwinch() -> (Box<OnSigWinch>, Box<SigCleanup>) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, are we not handling window size changes on windows? Or did I miss it?
If we're not doing this, isn't the experience a little wonky when you change the terminal window size? Something like: change size, see a broken interface which gets fixed after ~1 second?

@@ -182,7 +191,7 @@ pub fn get_input(

let network_frames = network_interfaces
.iter()
.filter(|iface| iface.is_up() && !iface.ips.is_empty())
.filter(|iface| !iface.ips.is_empty())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a bug in libpnet? Are the interface flags there, or is it not a thing in windows?
For now, can we hide this behind conditional compilation? Have the old way in not(windows) and this one in windows?


[dev-dependencies]
insta = "0.12.0"
pnet_base = "0.26.0"
cargo-insta = "0.11.0"
packet-builder = "0.5.0"
regex = "1"

[build-dependencies]
#[target.'cfg(target_os="windows")'.build-dependencies]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is correct. They were not downloaded locally on my linux.

build.rs Outdated
pcaplib_file.write_all(pcaplib_bytes.as_slice()).unwrap();
pcaplib_file.flush().unwrap();

println!("cargo:rustc-link-search=native={}", out_dir);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, going over: https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-searchkindpath
I understand this kind of thing is discouraged. Is there a way to install this on the system level, or is this something we'd want the user to do manually/have the package manager do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a static library, therefore I think that the rust docs warning does not apply here. Unfortunately there is no such thing as package manager on windows (at least not when you think about apt-get install lib-dev for C libs and headers).
Another rust library for interacting with libpcap does the same but in its Github Action. I think that build.rs approach is better, as it allows any developer to build windows version locally.
The Windows user will need to install NPcap driver manually for bandwidth to work properly, but that is a standard prerequisite for anything which sniffs the network on Windows (like wireshark).

}

#[cfg(target_os = "windows")]
fn download_winpcap_sdk() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this happens every time we build bandwhich, right? Even if the file already exists. Not terrible, but might make development a little slow? What do you think about using the rerun-if-changed flag on the resulting Packet.lib file? That way if it got deleted we'll redownload it, but otherwise skip this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, thanks. I thought that this flag was for C sources only, gotta check it out.


if let Ok(sockets_info) = sockets_info {
for si in sockets_info {
let mut procname = "";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a String::new()? I think this will make things a little clearer in the code forks below, and I think we'll end up with the same number of allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, that was more of a C/C# legacy than optimalization :)

@imsnif imsnif mentioned this pull request Sep 8, 2020
@remgodow
Copy link
Contributor Author

remgodow commented Sep 8, 2020

As for the process name regression on Linux, I have opened the enhancement request for sysinfo . I will wait a day or two, and either change it there or restore os speciffic implementation here.

@imsnif
Copy link
Owner

imsnif commented Sep 9, 2020

As for the process name regression on Linux, I have opened the enhancement request for sysinfo . I will wait a day or two, and either change it there or restore os speciffic implementation here.

I'll admit I'm a little hesitant about this, mostly because of the things we won't find. This is quite a big implementation change. I'm willing to give it a shot though. If we see there are more regressions we can hack the conditional compilation around it.

@remgodow
Copy link
Contributor Author

remgodow commented Sep 9, 2020

I've restored linux and mac implementations and adopted them to changes from this branch. Build fails on unused methods in mac implementation, we would need to remove them and corresponding tests or somehow tell clippy to ignore them. It would be nice if you could solve that. Besides the build error, I think the pull is ready to merge.

@imsnif
Copy link
Owner

imsnif commented Sep 9, 2020

That's great news, @remgodow ! Thank you very much for all your hard work on this. I'll solve the build issues, hopefully tomorrow - latest on Friday, and then give everything one last go over.

Copy link
Owner

@imsnif imsnif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the unused methods and tests in the mac implementation and all is green.
I went over this once again, and also ran some manual tests on the mac + linux version locally. This looks great and I'm merging it now. Let's continue the conversation here: #181

Once again, thanks for this great effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants