-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal - Update XPath to (at least) v2.0 #903
Comments
Per https://www.chromestatus.com/metrics/feature/popularity it does seem that about 1-2% of page views end up using XPath, so maybe it's worth considering, but I wouldn't really want to do anything here until #67 is fully settled, including tests. XPath has been a long neglected part of the platform, we should standardize what we have first before considering additions. |
@annevk thanks for pointing me out #67 ... I think I've searched in the HTML repo and not here, otherwise bumping the XPath version might be part of #67 too, imho, as once there's agreements for settling it, there might be agreements on what should run underneath, right? If you feel like that's the case, feel free to close this issue, and I'll keep watching/following the other one. |
It's been like a decade so I might remember wrongly, but I don't think XPath 2.0 is backwards compatible. That doesn't mean we couldn't do compatible extensions to 1.0, but I'm not sure what the appetite is for that. |
I'm really after having |
Chrome is not interested in this. The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them, or at least replace them with something that generates less security bugs. Increasing the capabilities of XML in the browser runs counter to that goal. |
if replaced, since work would need to be done regardless, what are the security implications of having |
Also worth mentioning that usage increased in the last years so that removing it looks indeed like a breaking change ... we just started using XPath extremely successfully in many occasions, having that fully removed would break many things so I hope there's room for changes but no deprecation ... it's super powerful as query language and it can provide things CSS might never have for perf or other reasons. |
"I don't think XPath 2.0 is backwards compatible." This is not true, at least in the sense I would understand it, i.e. that an XPath 2.0 (or indeed 3.1, there is no XPath 2 :) ) processor will happily run an XPath 1.0 statement, and return the same nodes as an XPath 1.0 processor. |
"The XML parts of our pipeline are in maintenance mode and we would love to eventually deprecate and remove them, or at least replace them with something that generates less security bugs. Increasing the capabilities of XML in the browser runs counter to that goal." Assuming for a moment that increasing XML capabilities "generates [...] security bugs" (I am not convinced), this is a proposal for querying the HTML DOM with XPath, not XML. (Thanks to @gimsieke for pointing this out!) |
By "XML parts of our pipeline" I mean "everything implemented using libxml and libxslt". |
Deprecating/replacing libxml and libxslt would be a prerequisite of updating support to XPath v >1. So Chrome should be behind such a move, right? |
I can tell this is not going to be a productive conversation, as folks are intent on playing word games to try and pretend Chrome has a different stance than we do. As such, I won't be participating in this thread further. I think I've made our position clear. |
If supporting XPath >=2.0 would mean everyone needs a completely new implementation, then wouldn't it be less work overall to just continue to improve CSS selectors to support the missing features? @domenic I don't think you are being fair to @yamahito's point. Just because XPath has "X" in the name doesn't necessarily mean it needs to have anything to do with XML. |
Sorry to make you feel I am playing word games, and that my joke has made you throw your toys out of the pram, but there was a serious point here, which I don't think you've addressed. You and the OP sort of have the same problem: libxml and libxslt have not been updated to work with updated specifications for a very long time. If you want a productive suggestion, how about the Saxon-HE/C library as a potential alternative? |
I mean maybe Michael Kay would have some idea whereby this could be doable and he would find it reasonable, but I think this makes it difficult for some browsers. I personally would love if I saw XPath getting some love, so don't take my comment as negative. |
However, Saxon-HE/C is open source: you wouldn't have support for all features (e.g. schema awareness), but I don't think those would be missed for this purpose. Of course, there may be other reasons why it's not doable (licensing issues), and I'm not qualified to comment on implementation. I certainly don't want to talk for Chrome, despite aspersions to the contrary. I just want to point out that the underlying issue is the use of a library many years out of date, but that said library does not reflect on XPath as a technology. |
@domenic I am not sure that "folks" included me (but I guess so ...)
I honestly had the feeling there was no room for any conversation, after your first reply:
although, this sentence is both not exactly what I've proposed, but also scary, 'cause SVG, as far as I know, is still part of the XML namespace/pipeline, and announcing that anything XML is going to be deprecated and removed is concerning, imo. I also think it's clear that developers knowing XPath, and its potentials, probably are not using it daily due its lack of improvements since 1999, so that asking why, where, or what, looks like a normal conversation to me, but "dropping the bomb and the mic" at the same time feels a bit "off", imho, but if there's anything I've said that made you put me in the "folks that play word games" category, I apology, 'cause even if I'm not sure where I gave you that feeling, it surely wasn't my intent. I hope that the idea to improve XPath to let Web developers fulfill any requirement not satisfied by current CSS offer would be considered at least by other vendors, specially after reading that XPath has apparently security implications, while it's still a W3C recommendation ... it took much less to deprecate SQLite, and no security issue was obvious at that time, it's weird something known as insecure has been kept for 20 years in the platform and never got a chance to be updated. |
@annevk XPath 2 (and, more to the point these days, 3.1) are highly backward compatible with XPath 1. There are some differences. Example: in XPath 1, the string value of a sequence is the string value of the first item in the sequence; that was crazy and caused lots of bugs in people's XPath expressions. The XPath 2 and 3 specs include notes for people implementing XPath 2 and 3 on how to handle those cases. They are very small edge cases & many are unlikely to apply to Web browser usage anyway. Possible implementation approaches include (1) make a standard API that includes the desired XPath version; this is badly needed in any case... (2) use a JavaScript-based implementation (see e.g. frameless.io), (3) write or reuse a C/rust/C++ one, most likely starting with an XQuery implementation as that's an extension of XPath (XQuery 1 extends XPath 2, confusingly; XQuery 3.1 extends XPath 3.1). Where XPath 1 was based on node lists, XPath 2 moved to being based on sequences; it's much more powerful for users, and a lot of things that were tricky became a lot clearer, but the underlying code is likely very different. A CSS xpath('expr' [, version]) function would be super useful e.g. in the content property, as it can do string processing on text in the document - even if only in the "slow" profile of CSS. @WebReflection the security issues in XPath are that there are functions (starting in XPath 1) that allow file access. The same security issues that XHTTPRequest has apply. There are also common extensions in XPath implementations to allow extended file access, but those make no sense in a Web browser - see e.g. expath.org. In XPath 3 it's possible to write recursive functions, as with JavaScript, so you could create infinite loops, and an implementation needs to detect this. There's also the possibility - again as with JavaScript - of building up variables, e.g. with the string concatenation operator || like this: Yes, CSS could be extended to be comparable - e.g to be able to do string matching & processing on text content, date/time arithmetic, joins, union/intersection, and so forth. It'd be a lot of work, although just adding matches() and replace() would go a long way - |
@liamquin thanks a lot for the clarification, and yes, that makes sense. However, if XPath 3 is more problematic than 2, in terms of possible footgun within the parsing and features, I think having v2 available in JS would already be a killer feature compared to 1, and since nobody wants new footguns in JS, upgrading to the least problematic version that provides |
Personally, as one of the authors of a free open source XPath 3.1 implementation (https://github.com/FontoXML/fontoxpath), I do not really see the point in shipping XPath 3.x or 2.0 in the browser. Rather, I would prefer to see a way to plugin into the CSS engine to use XPath in CSS, so that we can do what @liamquin described, but in a more flexible way. There will be many performance concerns over there, but those must be manageable in some way. |
@DrRataplan unless you are thinking about exposing XPath through Also worth reminding that updating XPath, as proposed in here, has nothing to do with styling, as any live styling through XPath will make pages likely very slow, otherwise we would already have |
I do not think @domenic meant removal of XML API
— https://bugs.chromium.org/p/chromium/issues/detail?id=514995 That said I have one example of XML API state. Have you known DOMParser parsing text/xml is slower than text/html? We have
There was a proposal, waits for #67, closed. jQuery popularized CSS selectors. Somehow there is not much XPath, XPath 2.0, XPath 3.0 activity on the web. It would be great if its proponents described how it helps them. Personally I use XPath to query text nodes and as
I do not think Web developers know and use
I would prefer Invisible XML approach
emulated with
( Each node node knows its type, parses underlying mini language and presents as if it was nodes. |
@sergeykish with XPath I can select even attribute nodes and/or text nodes, and this is gold for libraries based on template literals ... as example, this single query The rest of the functions are also well known, and there are cheatsheets that help with it too: |
Why 2.0? Why not the latest version? |
@sirinath apparently there's an agreement among XPath users that v1.0 is the right version to use and eventually new features should be implemented on top of v1.0, and to be honest, the only feature I really miss, and so do others, is the RegExp functionality, which together with current XPath 1.0 offer, would be already a huge upgrade in possibilities. As apparently nobody wants to touch this part of the Web anyhow, we should try to understand if bringing just that would be possible, or if we could just close this proposal as not accepted and move forward. |
I don't think that using Hacker News comments is good for determining that there is a consensus that XPath 1.0 is the right version to use/build on. If you wanted to do something like that you would need to do a survey of companies and hobbyists to see what stacks they are using and if they would use XPath 2.0/3.0/3.1 features if they were available on those stacks (including on web browsers, e.g. when testing via Selenium). Personally, I like the changes that XPath 2.0 made to the language, as it tidied up several things like not being able to do FWIW, I have recreated the XPath 1.0 grammar using the XPath 2.0 names and structures at https://rhdunn.github.io/xquery-intellij-plugin/specifications/XPath%201.0%20as%202.0%20EBNF%20Grammar.html. It is 47 EBNF symbols, compared to XPath 2.0's 82, 3.0's 108 and XPath 3.1's 126. That document also describes the grammar differences between XPath 1.0 and XPath 2.0. |
absolutely, and I haven't used the consensus word, I've just found interesting comments from various people actually using, and appreciating, what XPath brings to the plate, and many said 2 or even 3 are too much to implement and possibly problematic, but few said it should be relatively easy to add RegExp on top of the current implementation only, which is 1.0. As this issue mention upgrade to 2.0, that's the ideal dream/goal, but since vendors already stated they don't think this would ever happen, they have no interest, or it's complicated, then I'm just saying I personally miss RegExp, as I think that'd be a huge step forward already in scraping and querying possibilities. |
Hey, this issue is trending in HN https://news.ycombinator.com/item?id=24959588 - probably a good idea to lock it for a bit to reduce the amount of noise. |
Also, knowing some of the people involved - I think discussion here isn't too great:
I am not sure why no one brought up the fact that XPath 3 implementations exist in userland (this seems to be the most popular one) but they are not popular. So XPath 3 does not add capabilities to the web platform since it's possible in userland and is not popular and does not fix the issues with the existing APIs since it can't replace it because of compatibility. If you want to engage (constructively) with Chrome on this - you need to look at their perspective and explain how an investment into XPath 3.1 aligns with their goals. For example - get someone to sponsor work that reduces the existing technical debt significantly while adding the API. TBH if I were Chrome I'd likely still not go for it because of their perspective. |
Before the window here closes, as someone who isn’t here because of HN :), I thought it might be helpful to add a data point. I’ve also used XPath for the exact same purpose as @WebReflection. It seems very suited to this. (And — far more niche — I’ve also employed it when processing WOFF metadata.) I found the existing implementation adequate for both tasks, but figured it might still be useful for implementers to know that the xpath-for-template-substitutions pattern isn’t a one-off. |
@bathos I work in test automation and xpath is very popular and pervasive in that space. It is much more popular than CSS selectors for automation:
I think that if you want to "prove" this it's pretty easy to ask grid vendors (like sauce labs) and I'm sure they'll be happy to help (I'm happy to ask them if this is called into question). Note that "popularity" isn't the reason Chrome is objecting to this. |
Maybe a short-term (or medium-term) way forward would be to standardize an API for calling JavaScript functions from XPath. People could then either use the existing regular expression engine in JavaScript directly from XPath, or could implement fn:relace(). I know Tom Hodgins (innovati) has experimented with XPath in CSS selectors, but the escaping you have to do makes it essentially untenable - https://codepen.io/tomhodgins/pen/KxOOzZ @benjamingr thanks for the kind mention. The argument about popularity is like the story of a town separated from another town by separated by a great ravine - people asked the mayor for a bridge and his response was, there's very little demand: hardly anyone swims across the rapids today, so we should not build a bridge. The install instructions for fontoxpath assume an experienced JavaScript developer; frameless.io requires registration and has usage restrictions. The others i know of are commercial products, so for sure there is demand. And as you say, out of the browser, XPath is widely used in specific areas - along with XSLT for that matter. Any spec with an X in its name has an uphill struggle in browserland these day, and it's not an easy change that's proposed. |
@liamquin a JS hook into XPath 1.0 would surely open many possibilities but it would still require an update to the current implementation, which is what Chrome would like to avoid. @benjamingr there are userland libraries, but these are huge and slow compared to the current, native, XPath 1.0, which is why we're not considering adopting these, as we can use a bit of JS to crawl axes and RegExp, yet the dance is awkward and always a bit ad-hoc, but we don't have license issues, bundling issues, foreign code to watch out, etc. Of course something that can be written in JS will be written in JS, but at the same time, we all know as soon as something is available natively, it's better for everyone, and no unnecessary bloat or slower perf are needed. I still would like to understand if there is any room for improvements or not at all though, and in latter case I think we should close this issue as "won't fix" and move forward. |
While the latest recommendation is v3.1, most questions related to XPath seems to miss Regular Expressions, introduced in v2.0 which is nearly a 10 years ago recommendation.
However, all browsers support only XPath v1.0 from 1999.
Background
Widely adopted in 2007 by popular frameworks such as Dojo, Prototype, or Mootools, the XPath language is an extremely powerful tool to query and crawl the DOM in all its axes, hence superior than CSS, and able to unleash proposed selectors already, such as
:has(...)
even in its version 1.But this is only scratching the surface of operations that XPath can do, as opposite of querying via CSS, check surrounding DOM nodes via JS, check results are valid (i.e.
if (child.closest('container'))
plus there's no way to target text nodes or even comments.Proposal
Provide at least the method
matches(RegExp, flag)
to the current XPath 1.0 (let's call it 1.1) or provide at least v.2 of this old but gold standard to crawl any DOM tree, as if it's still updated and useful for back end crawlers, it's unclear why the first class citizen JS should not benefit from its potentials, way superior than CSS selectors, and less error prone, as filtering and complex searches can be done directly throughdocument.evaluate
.Thanks in advance for considering this improvement, as I'm sure once RegExp will be in, the usage of XPath for complex SPA/PWA pages would flourish again in either libraries, web components, or the Web in general.
The text was updated successfully, but these errors were encountered: