Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests to test_invalid_url for InvalidSchema #2222

Merged
merged 1 commit into from
Sep 12, 2014

Conversation

blueyed
Copy link
Contributor

@blueyed blueyed commented Sep 12, 2014

In b149be5 PreparedRequest was made to skip parse_url for e.g.
$HOST:$PORT, which results in MissingSchema not being raised for
localhost:3128.

@blueyed
Copy link
Contributor Author

blueyed commented Sep 12, 2014

I have created urllib3/urllib3#466 to possibly handle the "oddball schemes" in urllib3 directly.

EDIT: not their job.

@blueyed
Copy link
Contributor Author

blueyed commented Sep 12, 2014

@jvantuyl
Can you remember and elaborate on why you've made this change (kennethreitz@b149be5)?

@sigmavirus24
Copy link
Contributor

@blueyed I elaborated on this change in the issue you opened on shazow/urllib3. This allows for third-party developers to create adapters using the file protocol and has the side-effect of allowing data URLs as well even though requests will not handle them.

@jvantuyl
Copy link
Contributor

Basically, it makes requests extensible with other protocols. It doesn't add support, but it makes it at least possible. Two implementations (which I haven't tested in ages):

file protocol
data protocol

I originally used them for testing and ad-hoc usage of some CLI tools I built for deployment.

@sigmavirus24
Copy link
Contributor

Also I'm fairly certain it allowed @asmeurer to use the file protocol in conda/conda.

@jvantuyl
Copy link
Contributor

And, syntactically, "localhost:8000" does parse as a URL. It just uses the unknown scheme localhost. Interestingly, that's exactly the error you're getting (although as an AssertionError, which is probably bad form). Perhaps you could replace that assertion with a different error (say UnrecognizedScheme), check for it, then implement the fallback behavior?

@blueyed
Copy link
Contributor Author

blueyed commented Sep 12, 2014

I see.

So it's probably expected and sane to get a InvalidSchema: No connection adapters were found for 'localhost:5432' exception instead of MissingSchema: Invalid URL 'localhost:5432': No schema supplied. Perhaps you meant http://localhost? then.

I've thought about adding a regex that would let it fall through, but it doesn't change much in the end.

The expected outcome should get added as test however.

@jvantuyl

the error you're getting (although as an AssertionError, which is probably bad form). Perhaps you could replace that assertion with a different error (say UnrecognizedScheme), check for it, then implement the fallback behavior?

I do not understand this. It appears to be a proper exception already (InvalidSchema).
Where do you mean to add the check and fallback?

I came to this issue through pip and the handling of its proxy setting initially (IIRC, and unrelated).

@jvantuyl
Copy link
Contributor

You are correct. I was just responding to the raising of AssertionError with a made-up error. I forgot that such an exception existed. Looks like that's been noted as #2222.

@asmeurer
Copy link

@jvantuyl
Copy link
Contributor

My problems centered around urllib.util.url.parse_url doing some interesting splitting when certain characters were involved and then further mangling inside of requests.models.prepare_url. If you have the wrong characters in your URL, interesting things can happen (and all of them are pretty clearly incorrect). I suspect that those same issues wouldn't be any better for anybody else if someone has exciting and interesting file paths. :/

@blueyed blueyed force-pushed the fix-MissingScheme-for-hostport branch from 3921bac to c2486a0 Compare September 12, 2014 13:49
@blueyed
Copy link
Contributor Author

blueyed commented Sep 12, 2014

I've changed the PR to add more tests, which document/test the current behavior.

@blueyed blueyed changed the title TBD: MissingSchema not raised for localhost:port. Add tests to test_invalid_url for InvalidSchema Sep 12, 2014
This adds tests for the behavior introduced in b149be5, where
`PreparedRequest` was made to skip `parse_url` for e.g.
`localhost:3128/`.
@blueyed blueyed force-pushed the fix-MissingScheme-for-hostport branch from c2486a0 to d3566ee Compare September 12, 2014 14:11
@Lukasa
Copy link
Member

Lukasa commented Sep 12, 2014

Suits me. 🍰

Lukasa added a commit that referenced this pull request Sep 12, 2014
Add tests to test_invalid_url for `InvalidSchema`
@Lukasa Lukasa merged commit 1a44a99 into psf:master Sep 12, 2014
@@ -78,6 +79,12 @@ def test_entry_points(self):
def test_invalid_url(self):
with pytest.raises(MissingSchema):
requests.get('hiwpefhipowhefopw')
with pytest.raises(InvalidSchema):
requests.get('localhost:3128')
with pytest.raises(InvalidSchema):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and the test below are exactly the same test case since urlparse parses them in the exact same way:

>>> urlparse.urlparse('localhost.localdomain:3128/')
ParseResult(scheme='localhost.localdomain', netloc='', path='3128/', params='', query='', fragment='')
>>> urlparse.urlparse('10.122.1.1:3128/')
ParseResult(scheme='10.122.1.1', netloc='', path='3128/', params='', query='', fragment='')

They should be consolidated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sigmavirus24
I would argue that this is an implementation detail, and the test should be independent of this, and it's better to have more tests than less.
It might happen that the test for skipping wouldn't catch IP addresses, but only domains.

For example, given an IP address and/or the dots in the "scheme", it would be possible to not use this as a scheme. These tests are meant to keep this behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are still functionally equivalent. They're not an implementation detail because as I mentioned, RFC 3986 has no way of identifying that the part before the : here is not a scheme. So any specification compliant implementation will do this, ergo it's a specification detail that makes these tests functionally equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I've meant is that this tests the skipping code: if this was changed, e.g. by using a more sophisticated regex, the behavior might change and this additional test might catch it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have a regular expression that does this. We have 3 options as I see them:

  1. continue to rely on urlparse from the standard library
  2. use urllib3's URL parser
  3. use rfc3986's URI parser

They all, to varying degrees, follow the specification and will have very similar, if not exactly the same behaviour. We are far more likely to rely on third party libraries that do things efficiently to regular expressions that we put together ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am referring to this code (https://github.com/kennethreitz/requests/blob/master/requests/models.py#L337-L342):

    if ':' in url and not url.lower().startswith('http'):
        self.url = url
        return

If this would get changed, an IP address might get handled different from a hostname.

Here btw urllib3s URL parser is being used further down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, forgive me @blueyed. I was mistaking the discussion here for one of the other issues you've filed recently.

@asmeurer
Copy link

We haven't come across that issue in conda, but assumedly the fix is to first urllib.parse.quote the parts of the path.

blueyed added a commit to blueyed/requests that referenced this pull request Sep 12, 2014
blueyed added a commit to blueyed/requests that referenced this pull request Sep 12, 2014
sigmavirus24 added a commit that referenced this pull request Sep 12, 2014
Document skipping in PreparedRequest; followup to #2222
@blueyed blueyed deleted the fix-MissingScheme-for-hostport branch September 12, 2014 19:50
ContinuousFunction pushed a commit to ContinuousFunction/requests that referenced this pull request Nov 14, 2014
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants