Skip to content

Commit

Permalink
Fix double escaping of query string in redirect
Browse files Browse the repository at this point in the history
  • Loading branch information
jhy committed Jan 22, 2017
1 parent 83f01fd commit 56a728d
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 1 deletion.
3 changes: 3 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ jsoup changelog
* Bugfix: Element.hasClass() and the ".classname" selector would not find the class attribute case-insensitively.
<https://github.com/jhy/jsoup/issues/814>

* Bugfix: In Jsoup.Connection, if a redirect contained a query string with %xx escapes, they would be double escaped
before the redirect was followed, leading to fetching an incorrect location.

*** Release 1.10.2 [2017-Jan-02]
* Improved startup time, particularly on Android, by reducing garbage generation and CPU execution time when loading
the HTML entity files. About 1.72x faster in this area.
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/org/jsoup/helper/HttpConnection.java
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ private static String encodeUrl(String url) {
private static URL encodeUrl(URL u) {
try {
// odd way to encode urls, but it works!
final URI uri = new URI(u.getProtocol(), u.getUserInfo(), u.getHost(), u.getPort(), u.getPath(), u.getQuery(), u.getRef());
final URI uri = new URI(u.toExternalForm());
return new URL(uri.toASCIIString());
} catch (Exception e) {
return u;
Expand Down
17 changes: 17 additions & 0 deletions src/test/java/org/jsoup/integration/UrlConnectTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -796,4 +796,21 @@ public void run() {

assertTrue(body[0].length() > 0);
}

@Test public void handlesEscapedRedirectUrls() throws IOException {
String url = "http://www.altalex.com/documents/news/2016/12/06/questioni-civilistiche-conseguenti-alla-depenalizzazione";
// sends: Location:http://shop.wki.it/shared/sso/sso.aspx?sso=&url=http%3a%2f%2fwww.altalex.com%2fsession%2fset%2f%3freturnurl%3dhttp%253a%252f%252fwww.altalex.com%253a80%252fdocuments%252fnews%252f2016%252f12%252f06%252fquestioni-civilistiche-conseguenti-alla-depenalizzazione
// then to: http://www.altalex.com/session/set/?returnurl=http%3a%2f%2fwww.altalex.com%3a80%2fdocuments%2fnews%2f2016%2f12%2f06%2fquestioni-civilistiche-conseguenti-alla-depenalizzazione&sso=RDRG6T684G4AK2E7U591UGR923
// then : http://www.altalex.com:80/documents/news/2016/12/06/questioni-civilistiche-conseguenti-alla-depenalizzazione

// bug is that jsoup goes to
// GET /shared/sso/sso.aspx?sso=&url=http%253a%252f%252fwww.altalex.com%252fsession%252fset%252f%253freturnurl%253dhttp%25253a%25252f%25252fwww.altalex.com%25253a80%25252fdocuments%25252fnews%25252f2016%25252f12%25252f06%25252fquestioni-civilistiche-conseguenti-alla-depenalizzazione HTTP/1.1
// i.e. double escaped

Connection.Response res = Jsoup.connect(url)
.proxy("localhost", 8888)
.execute();
Document doc = res.parse();
assertEquals(200, res.statusCode());
}
}

2 comments on commit 56a728d

@Pmmlabs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, I just ran into this problem!

@moneytoo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also having issues because of this with multiple sites including nytimes links etc. so I'm hoping 1.10.3 will be released soon ;)

Please sign in to comment.