Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SelectorParseException when calling Element#cssSelector() #1966

Closed
remi-sf opened this issue Jun 8, 2023 · 2 comments
Closed

SelectorParseException when calling Element#cssSelector() #1966

remi-sf opened this issue Jun 8, 2023 · 2 comments
Labels

Comments

@remi-sf
Copy link

remi-sf commented Jun 8, 2023

Hi,

My team have encountered this crash trying to blindly call Element#cssSelector() on elements.

The signature is:

org.jsoup.select.Selector$SelectorParseException: Could not parse query 'ul.sp-c-sport-flyout__inner.gs-u-mb\': unexpected token at '\'

	at org.jsoup.select.QueryParser.findElements(QueryParser.java:226)
	at org.jsoup.select.QueryParser.parse(QueryParser.java:74)
	at org.jsoup.select.QueryParser.parse(QueryParser.java:45)
	at org.jsoup.select.QueryParser.combinator(QueryParser.java:90)
	at org.jsoup.select.QueryParser.parse(QueryParser.java:60)
	at org.jsoup.select.QueryParser.parse(QueryParser.java:45)
	at org.jsoup.select.Selector.select(Selector.java:98)
	at org.jsoup.nodes.Element.select(Element.java:418)
	at org.jsoup.nodes.Element.cssSelector(Element.java:858)

To reproduce this, run the following test case:

void test() throws IOException
    {
        final String html = "<ul class=\"sp-c-sport-flyout__inner gs-u-mb+ gs-u-display-none@m qa-flyout-primary\"><li class=\"sp-c-sport-flyout__item \" role=\"presentation\"><a class=\"sp-c-sport-flyout__link qa-flyout-primary-item sp-nav-click-stat\" role=\"menuitem\" data-stat-name=\"primary-nav-v2-mobile\" data-stat-title=\"Home\" data-stat-link=\"/sport\" href=\"/sport\">Home</a></li></ul>";
        final Document document = Jsoup.parse(html);
        document.getElementsByTag("ul").get(0).cssSelector();
    }

The class gb-u-mb+ is causing the crash, and removing it from the HTML avoids the crash. I suppose the + character is invalid for a CSS class? In which case, this might not really be a bug and we'll just have to handle the runtime exception in our application.

The HTML comes from the web page in the attached archive:
Transfer news live & West Ham in Europa Conference League final - Live - BBC Sport.html.zip

(Reproduced in JSoup 1.15.4)

@erfansn
Copy link

erfansn commented Jan 27, 2024

I agree, similar issue when parsing "td:first-child" in testing environment but in production anything is fine!

@jhy
Copy link
Owner

jhy commented Aug 27, 2024

Thanks; this was fixed along with #2146.

@jhy jhy closed this as completed Aug 27, 2024
@jhy jhy added the fixed label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants