Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double URLencoding since 1.15.4 due to #1873 "fix" #1936

Closed
fcolin-odigo opened this issue Apr 12, 2023 · 3 comments
Closed

Double URLencoding since 1.15.4 due to #1873 "fix" #1936

fcolin-odigo opened this issue Apr 12, 2023 · 3 comments
Labels
duplicate This is a duplicate issue or root-cause of another issue

Comments

@fcolin-odigo
Copy link

This commit 45ed002 fixes the bug #1873 by automatically encoding URL when HttpConnection.connect(String) or Jsoup.connect(String) is used.

This encodes already-properly encoded URL and break them. Example: I want to call JSoup on https://example.com/api with a parameter callback which value is https://callback.url:12345/call/me/back. This value must be encoded.

Before, I was able to properly handle parameter encoding on my side and do:

final var connection = Jsoup.connect("https://example.com/api?callback=https%3A%2F%2Fcallback.url%3A12345%2Fcall%2Fme%2Fback");

And everything worked fine.

Since 1.15.4, this code forces another layer of encoding on top on my encoding, and the URL called is:

https://example.com/api?callback=https%253A%252F%252Fcallback.url%253A12345%252Fcall%252Fme%252Fback%0A

Which obviously don’t work.

From my point of view, the #1874 "fix" should be reverted, as this is not the role of the underlying library (here Jsoup) do to some kind of magic with bad input parameter. Jsoup should reject properly URL without encoded parameter and let the user to handle this correctly.

Also, this was a breaking change, and for semver’s sake, there should not be breaking changes on "patch" version.

Workarounds:

  • Remove proper URL encoding on URL given to JSoup as String (… seriously).
  • Use the URL methods (as HttpConnection.connect(URL)) that doesn’t suffer from this double URLencoding bug.
@promosrene
Copy link

Hi,
I can confirm this error and would also agree that jsoup should never change the redirect location.

@rudolfgrauberger
Copy link

@jhy It looks like the error has come back in (already fixed in #839 for 1.10.3)

	<dependency>
		<groupId>org.jsoup</groupId>
		<artifactId>jsoup</artifactId>
		<version>1.15.4</version>
	</dependency>
      try {
                String html = Jsoup
                    .connect("https://www.transfermarkt.de/filip-kosti%C4%87/profil/spieler/161011")
                    .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.39")
                    .get()
                    .html();
      } catch (Exception e) {
            System.out.println(e.toString());
      }

Results in an exception with following output

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=[https://www.transfermarkt.de/filip-kosti%25C4%2587/profil/spieler/161011]

Versions 1.15.2 and 1.15.3, on the other hand, still work as expected!

@jhy
Copy link
Owner

jhy commented Apr 24, 2023

This is fixed in @1914

@jhy jhy closed this as completed Apr 24, 2023
@jhy jhy added the duplicate This is a duplicate issue or root-cause of another issue label Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This is a duplicate issue or root-cause of another issue
Projects
None yet
Development

No branches or pull requests

4 participants