Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exclude private URLs feature #7

Merged
merged 6 commits into from
Oct 17, 2020

Conversation

pawroman
Copy link
Member

@pawroman pawroman commented Oct 9, 2020

Currently, the excludes are based purely on IP address ranges and only work for websites (no support for mail addresses). The IPv6 support is implemented, to a degree possible with Rust standard library as of 1.47.0. See commit messages and code comments for more details.

Note: while IP address exclude works, it's possible that some URLs will resolve to link-local, private or loopback addresses. While this is not very likely, it's a possibility. To address this, we'd have to DNS-resolve all URIs and then run them through IP exclude rules. I have skipped this part because of extra effort involved.

Closes #5.

@pawroman
Copy link
Member Author

pawroman commented Oct 9, 2020

I quite honestly don't know why the build is failing. The tests run on my local machine just fine (Rust 1.47.0, Linux, x86_64).

@pawroman
Copy link
Member Author

pawroman commented Oct 9, 2020

The CLI options parser generated weird flag names for this:

Usage: target/debug/lychee [OPTIONS]

Positional arguments:
  inputs                 Input files

Optional arguments:
  -h, --help             show help
  -v, --verbose          Verbose program output
  -m, --max-redirects MAX-REDIRECTS
                         Maximum number of allowed redirects (default: 10)
  -t, --threads THREADS  Number of threads to utilize (defaults to  number of cores available to the system
  -u, --user-agent USER-AGENT
                         User agent (default: curl/7.71.1)
  -i, --insecure         Proceed for server connections considered insecure (invalid TLS) (default: false)
  -s, --scheme SCHEME    Only test links with given scheme (e.g. https)
  -e, --exclude EXCLUDE  Exclude URLs from checking (supports regex)
  -E, --exclude-private  Exclude private IP address ranges from checking
  --exclude-link-local   Exclude link-local IP address range from checking
  --exclude-loopback     Exclude loopback IP address range from checking
  -H, --headers HEADERS  Custom request headers
  -a, --accept ACCEPT    Comma-separated list of accepted status codes for valid links
  -T, --timeout TIMEOUT  Website timeout from connect to response finished (default: 20)
  -M, --method METHOD    Request method (default: get)

Any preferences as what to should the flags be?

I have a proposal:

  • Drop the short flags, leaving only long ones AND
  • Have a --exclude-all-private flag, mapped to -E, that would enable all of them (--exclude-private --exclude-link-local --exclude-loopback).

What do you think?

@pawroman
Copy link
Member Author

pawroman commented Oct 9, 2020

I quite honestly don't know why the build is failing. The tests run on my local machine just fine (Rust 1.47.0, Linux, x86_64).

I was able to work out that it uses Rust 1.46.0, as included in ubuntu-latest image: https://github.com/actions/virtual-environments -- see: https://github.com/actions/virtual-environments/blob/main/images/linux/Ubuntu1804-README.md

I have installed Rust 1.46.0 locally using rustup, and re-ran the tests, and they did fail indeed, with the same message.
Then I checked out the master branch (without my changes) and re-ran the tests, they failed.

Then I installed Rust 1.45.2 and re-ran the tests for master branch. It did not fail. Then I checked my branch, it did not fail either.

So, it's likely a problem with Rust 1.46.0... Some compiler bug perhaps? I found this one: rust-lang/rust#54540
I'm not sure what the next steps should be. Let me know what you think!

PS: For reference, here's how to install toolchains:

$ rustup toolchain install 1.45.2
$ rustup default 1.45.2

@pawroman
Copy link
Member Author

pawroman commented Oct 9, 2020

For the record: adding

#![type_length_limit = "7912782"]

At the very top of main.rs makes it work with Rust 1.46.0 -- but it feels a bit like an ugly hack 😞

@mre
Copy link
Member

mre commented Oct 15, 2020

@xiaochuanyu found a fix in #6. As you guessed, it's likely an issue with the 1.46.0 toolchain. Updating to the latest version seems to fix the issue.

return true;
}

// Note: in a pathological case, an IPv6 address can be IPv4-mapped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow. That's both shocking and hilarious. 😆

@@ -285,12 +363,14 @@ mod test {
checker
}

fn website_url(s: &str) -> Uri {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -23,6 +24,17 @@ impl Uri {
Uri::Mail(_address) => None,
}
}

pub fn host_ip(&self) -> Option<IpAddr> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nicely done

@mre
Copy link
Member

mre commented Oct 15, 2020

@pawroman can you reintegrate master? Merged the changes by @xiaochuanyu.
Apart from that it looks great!

This lets extract the host IP address, if defined for a website.
Mail addresses are not supported.
The exclusion is currently based on IP addresses, as specified by inputs.
We support IPv4 and IPv6, where possible using the current stable stdlib
(as of Rust 1.47.0).

Note that we could go one step further and resolve all URIs using DNS
and then exclude-filter the private IPs.
This makes the test code shorter and more readable.
@pawroman pawroman force-pushed the exclude-private-urls branch from 6bb9e59 to c043776 Compare October 16, 2020 11:59
@pawroman
Copy link
Member Author

@pawroman can you reintegrate master? Merged the changes by @xiaochuanyu.
Apart from that it looks great!

Thanks for the review. I have rebased against master, good to see that the CI is green now :)

Any thoughts on the CLI options? #7 (comment)

@mre
Copy link
Member

mre commented Oct 16, 2020

I have a proposal:

  • Drop the short flags, leaving only long ones AND
  • Have a --exclude-all-private flag, mapped to -E, that would enable all of them (--exclude-private --exclude-link-local --exclude-loopback).

Sounds very reasonable to me. 👍

@pawroman
Copy link
Member Author

Sounds very reasonable to me. +1

👍

Implemented, along with a test that runs the program against a test file with the flag.

This should now be ready to merge.

@mre
Copy link
Member

mre commented Oct 17, 2020

Just gave it a spin locally and it does what it says on the can. Great work and thanks for your contribution @pawroman. 😃

@mre mre merged commit dd73d6e into lycheeverse:master Oct 17, 2020
@pawroman pawroman deleted the exclude-private-urls branch October 18, 2020 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exclude private URLs
2 participants