Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Some selectors in :has are treated as plain values #45

Closed
2 tasks done
savetheclocktower opened this issue Jan 25, 2024 · 1 comment · Fixed by #46
Closed
2 tasks done

bug: Some selectors in :has are treated as plain values #45

savetheclocktower opened this issue Jan 25, 2024 · 1 comment · Fixed by #46
Labels
bug Something isn't working

Comments

@savetheclocktower
Copy link
Contributor

Did you check existing issues?

  • I have read all the tree-sitter docs if it relates to using the parser
  • I have searched the existing issues of tree-sitter-css

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

No response

Describe the bug

Only certain kinds of selectors fail to be parsed within a :hasclass_selector and id_selector when they have tag names.

Steps To Reproduce/Bad Parse Tree

This parses correctly:

div.myclass:has(li) {}
(stylesheet [0, 0] - [1, 0]
  (rule_set [0, 0] - [0, 22]
    (selectors [0, 0] - [0, 19]
      (pseudo_class_selector [0, 0] - [0, 19]
        (class_selector [0, 0] - [0, 11]
          (tag_name [0, 0] - [0, 3])
          (class_name [0, 4] - [0, 11]))
        (class_name [0, 12] - [0, 15])
        (arguments [0, 15] - [0, 19]
          (tag_name [0, 16] - [0, 18]))))
    (block [0, 20] - [0, 22])))

This does not:

div.myclass:has(li.foo) {}
(stylesheet [0, 0] - [0, 30]
  (rule_set [0, 0] - [0, 30]
    (selectors [0, 0] - [0, 27]
      (pseudo_class_selector [0, 0] - [0, 27]
        (class_selector [0, 0] - [0, 11]
          (tag_name [0, 0] - [0, 3])
          (class_name [0, 4] - [0, 11]))
        (class_name [0, 12] - [0, 15])
        (arguments [0, 15] - [0, 27]
          (plain_value [0, 16] - [0, 26]))))
    (block [0, 28] - [0, 30])))

Here are some other examples that parse exactly as expected:

div.myclass:has(#foo) {}
div.myclass:has(.bar) {}
div.myclass:has(foo[bar]) {}
div.myclass:has(li ~ p) {}
div.myclass:has(li p) {}
div.myclass:has(p li.foo) {} /* (weirdly enough) */

And here are some which are interpreted as plain_value:

div.myclass:has(li#foo) {}
div.myclass:has(li.foo) {}
div.myclass:has(li.foo p) {}
div.myclass:has(p.bar li.foo) {}

Expected Behavior/Parse Tree

In each of these cases, the plain_value should instead be a selectors node. :has can accept selectors of arbitrary complexity, much like :not.

Repro

No response

@savetheclocktower
Copy link
Contributor Author

savetheclocktower commented Jan 26, 2024

So I think I understand the problem:

  • When the parser has just consumed the opening (, it sees li.foo ahead of it
  • It can interpret that as a series of selector-related tokens, or it can parse it as a plain_value
  • It chooses plain_value because that gives it the longest possible match
  • The other examples that are parsed correctly don't have this problem because none of them are valid plain_values
  • But li#foo and li.foo both are, because a plain value can be a URL, and both . and # are characters that occur in URLs

So this is a lexical precedence issue. I can think of a few solutions:

  • Define a different version of plain_value that excludes URLs (something like plain_value_without_url, but aliased to plain_value), then a different version of _value that lists plain_value_without_url instead of plain_value among its options, and then change pseudo_class_arguments to choose between _selector and my _value_without_url
  • Invert the problem by being more strict about where URLs are allowed as plain values: only inside url functions (which is how I fixed a similar problem in my tree-sitter-css fork). Hence plain_value excludes URLs by default, and only in one specific usage do you need plain_value_with_url instead
  • Put an external scanner in charge of parsing URLs (but something like li#foo might actually be a valid URL in some strange context; not sure)

But the simplest thing I can think of — use prec to encourage the parser to favor _selector over plain_value — is the one I just can't get working.

I could demote plain_value to a lower precedence, and this solves my problem…

    plain_value: _ => token(prec(-1, seq(
      repeat(choice(
        /[-_]/,
        /\/[^\*\s,;!{}()\[\]]/, // Slash not followed by a '*' (which would be a comment)
      )),
      /[a-zA-Z]/,
      repeat(choice(
        /[^/\s,;!{}()\[\]]/, // Not a slash, not a delimiter character
        /\/[^\*\s,;!{}()\[\]]/, // Slash not followed by a '*' (which would be a comment)
      )),
    ))),

…but breaks three other tests. I'd much rather boost the precedence of _selectors, but I can't seem to get that to have any effect.

I think I'm pretty close on this one and just need a nudge to find the right answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant