Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix userinfo in case of some special characters #144

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/URI.pm
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ sub _fix_uric_escape_for_host_part {
return;
}

if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//([^/?\#]+)(.*)$}os) {
if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//((.*:.*@)?[^/?\#]+)(.*)$}os) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a discussion with @genio on IRC:

It would be nice to avoid (.*@) or (.*:.*@) as it's greedy and could be prone to bugs.

We could take inspiration from https://metacpan.org/dist/Mojolicious/source/lib/Mojo/URL.pm#L52-64

After gathering the host and port part, ^([^@]+@) should be fine.

Copy link
Author

@kastakhov kastakhov Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.

like the similar approach is already used in LWP/Protocol/http.pm.
And I already mentioned what issue it might cause in my PR.

For my standpoint (.*:.*@) is pretty lazy, but okay, I agree that it could be better.

@oalders, what you think about ([^:]+:.*@) for matching user info?

This will match everything except : as username cannot contain colon, then match : as separator, match everything after first colon as password or nothing as according to rfc3986#section-3.2.1 it might be empty, and finally match @ as second separator.

Copy link
Author

@kastakhov kastakhov Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the regex which Mojo/URL.pm#L52-64 is used also cannot parse username/password if it contains any of /#? symbols :)
I know, it's really rarely usecase, but unfortunately I faced with it already two times :d

For instance, this regex with username and password which contain all special characters in unescaped form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.

Then it should be URL-encoded as %40. I'm not sure if there's any way around that. Something has to serve as a stopping point. If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.

Hm.
Now I see, it'll match this path incorrectly.

Copy link
Author

@kastakhov kastakhov Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually I can revert this change as the most important part is in lib/URI/_server.pm#14.

There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d

If the password was inserted directly into the URL that way without percent-encoding, that's not really the URI module's fault. The password has to be encoded properly first. That's the point of escaping characters.

my $orig = $2;
my ($user, $host) = $orig =~ /^(.*@)?([^@]*)$/;
$user ||= '';
Expand Down
26 changes: 16 additions & 10 deletions lib/URI/_server.pm
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,25 @@ use warnings;

use parent 'URI::_generic';

use URI::Escape qw(uri_unescape);
use URI::Escape qw(uri_unescape uri_escape);

our $VERSION = '5.29';

sub _uric_escape {
my($class, $str) = @_;
if ($str =~ m,^((?:$URI::scheme_re:)?)//([^/?\#]*)(.*)$,os) {
my($scheme, $host, $rest) = ($1, $2, $3);
my $ui = $host =~ s/(.*@)// ? $1 : "";
my $port = $host =~ s/(:\d+)\z// ? $1 : "";
if (_host_escape($host)) {
$str = "$scheme//$ui$host$port$rest";
}
if ($str =~ m,^((?:$URI::scheme_re:)?)//(.*:.*@)?([^/?\#]*)(.*)$,os) {
my $scheme = $1;
my $ui = $2 || '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a more descriptive name for this variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this variable was not your creation. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I have renamed to $userinfo.
I decided to rename in whole file as it might be a little confusing when we're using $ui or $userinfo.

my $host = $3;
my $rest = $4;
my $port = $host =~ s/(:\d+)\z// ? $1 : "";
if ($ui) {
# escape /?# symbols as they are used
# in subsequent regex for path parsing
$ui = uri_escape($ui, '/?#');
}
_host_escape($host);
$str = "$scheme//$ui$host$port$rest";
}
return $class->SUPER::_uric_escape($str);
}
Expand All @@ -26,8 +32,8 @@ sub _host_escape {
return if URI::HAS_RESERVED_SQUARE_BRACKETS and $_[0] !~ /[^$URI::uric]/;
return if !URI::HAS_RESERVED_SQUARE_BRACKETS and $_[0] !~ /[^$URI::uric4host]/;
eval {
require URI::_idna;
$_[0] = URI::_idna::encode($_[0]);
require URI::_idna;
$_[0] = URI::_idna::encode($_[0]);
};
return 0 if $@;
return 1;
Expand Down
50 changes: 49 additions & 1 deletion t/http.t
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use strict;
use warnings;

use Test::More tests => 16;
use Test::More tests => 76;

use URI ();

Expand Down Expand Up @@ -48,3 +48,51 @@ $u = URI->new("http://%65%78%61%6d%70%6c%65%2e%63%6f%6d/%70%75%62/%61/%32%30%30%
is($u->canonical, "http://example.com/pub/a/2001/08/27/bjornstad.html");

ok($u->has_recognized_scheme);

my $username = 'u1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $exp_username = 'u1!%22%23$%&\'()*+,-.%2F;%3C=%3E%3F@%5B%5C%5D%5E_%60%7B%7C%7D~';
my $password = 'p1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $exp_password = 'p1!%22%23$%&\'()*+,-.%2F;%3C=%3E%3F@%5B%5C%5D%5E_%60%7B%7C%7D~';
my $path = 'path/to/page';
my $query = 'a=b&c=d';
my %host = (
'[::1]' => {
host => '::1',
port => 80,
},
'[::1]:8080' => {
host => '::1',
port => 8080,
},
'127.0.0.1' => {
host => '127.0.0.1',
port => 80,
},
'127.0.0.1:8080' => {
host => '127.0.0.1',
port => 8080,
},
'localhost' => {
host => 'localhost',
port => 80,
},
'localhost:8080' => {
host => 'localhost',
port => 8080,
},
);

foreach my $host (keys %host) {
my $uri = URI->new("http://${username}:${password}\@${host}/${path}?${query}");
is($uri->scheme, 'http');
is($uri->userinfo, "${exp_username}:${exp_password}");
is($uri->host, $host{$host}->{host});
is($uri->port, $host{$host}->{port});
is($uri->path, "/${path}");
is($uri->query, $query);
is($uri->authority, "${exp_username}:${exp_password}\@${host}");
is($uri->as_string, "http://${exp_username}:${exp_password}\@${host}/${path}?${query}");
is($uri->as_iri, "http://${exp_username}:${exp_password}\@${host}/${path}?${query}");
is($uri->canonical, "http://${exp_username}:${exp_password}\@${host}/${path}?${query}");
}