[css-fonts] Drop bracket matching step from `@font-face` src: line parsing #6340

drott · 2021-06-02T16:09:12Z

For context, the original resolution from which the spec text for parsing the src: line of the @font-face descriptor was tracked in issue #633. This issue is related to #6328.

The current spec text says:

The value of the 'src!!descriptor' descriptor is parsed piecewise. First, opening and closing parentheses are matched. If these parentheses cannot be matched, the value is a parse error. Matching the parentheses has the effect of partitioning the value into multiple regions, each of which is either inside outermost parentheses or outside outermost parentheses. Once the value is partitioned, all the commas outside outermost parentheses are located. The value is then split at these comma locations.

Why do we need the bracket matching? In my quick analysis, at least WebKit and Blink do not do that but the top level step for parsing the src: descriptors value is to split this into a comma separated list of values, then parse these individually. I am not so familiar with reading Rust code, but it roughly looks like Gecko parses this as a comma separated list at the top level as well? (@jfkthame, could you comment?).

I suggest to drop this text and explain that the first segmentation step is separating by commas, then parsing each individual value separately, unless I am missing something on why this might be needed. To me it seems this would not conflict with any of the second level parsing of individual source values. (It does not look to me as commas would have any other meaning in this descriptor except separating sources, do they?)

This would also address the TODO issue in the text and explain that after comma separation, values that are not parsed correctly are dropped from the list and not considered as downloadable resources.

The text was updated successfully, but these errors were encountered:

svgeesus · 2021-06-02T17:31:38Z

Right, because after segmentation on commas the procedure is

Each item in the split value is then parsed against this grammar:

and the grammar includes parentheses. Also Fonts 3 didn't define parsing this way so I suspect implementations just extended what they already had for Fonts 3.

@litherum you added that section could you explain the reasoning?

svgeesus · 2021-06-02T18:02:22Z

Also this resolution:

RESOLVED: Parse the value of src throwing out invalid parts like media queries and not like selectors. aka You split on the commas and throw out the pieces you don't understand not the whole thing.

svgeesus · 2021-06-02T18:05:20Z

I suspect the split on parantheses came about because at one point we were considering syntax like

@font-face {
  font-family: heading-font;
  src: url(fancy-font.woff2) format("woff2;tables=CPAL,FVAR"),
        url(fallback-font.woff2) format("woff2"),
        url(fallback-fallback-font.woff) format("woff"),
        url(how-old-is-your-browser-font.ttf) format("ttf");
}

Note the commas inside the overloaded format specifier.

But we ended up not doing that, so segmenting first on balanced parentheses no longer has any value.

svgeesus · 2021-06-02T18:07:41Z

Agenda+ in case there are any good arguments in favor of the balanced-parens initial step. Otherwise, I'm fine dropping that part to improve internal spec consistency and also better align with actual implementations

jfkthame · 2021-06-02T18:35:47Z

If the first step is to split on commas, what happens if there's a URL that includes a comma?

svgeesus · 2021-06-02T18:42:27Z

comma is in the set sub-delims which also says

If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.

I guess that was never an issue for CSS Fonts 3, which didn't have the split-on-balanced-parens step?

jfkthame · 2021-06-02T20:35:57Z

I don't think @drott's original comment here:

In my quick analysis, at least WebKit and Blink do not do that but the top level step for parsing the src: descriptors value is to split this into a comma separated list of values, then parse these individually

can be quite accurate, given that an example like:

@font-face {
  font-family: test;
  src: url(data:font/ttf;base64,AAEA[...snipped...]QjUA);
}

successfully loads the base64-encoded font resource in all of Safari, Chrome and Firefox. If they were splitting the src: value on comma as a top-level step, surely that wouldn't work. They'd end up seeing two items, neither of them complete or parsable.

tabatkins · 2021-06-02T23:36:27Z

I'll note that I added an algo to Syntax for this exact case (for other reasons) that behaves correctly, so the spec can just invoke that: https://drafts.csswg.org/css-syntax/#parse-comma-separated-list-of-component-values

tabatkins · 2021-06-02T23:37:25Z

(CSS specifications can't specify text-level parsing details anyway; by the time they have a value that they know is part of a particular property, it's already been tokenized and had basic parsing done on it, so all you've got is tokens, not characters. The above algo works at the correct level.)

drott · 2021-06-03T13:34:09Z

Thanks @jfkthame and @tabatkins for observations and clarifications. Of course, Jonathan, you're right that the data url scheme you're giving as an example does work. And it's also true that the CSS parsing operates on a token stream, not on a character stream.

What I looked at during my analysis for WebKit and Chrome were these bits of code for where a font face src descriptor is parsed:
ConsumeFontFaceSrc Chrome and consumeFontFaceSrc in WebKit

In particular, the behavior of the parser here consuming and progressing to each next value comma-token by comma-token. But indeed while still inner component values such as url() can contain commas which are consumed in nested parsing functions called consumeFontFaceSrcLocal and consumeFontFaceURI. So my earlier description of first separating by commas is not quite accurate.

With that clarified, I am still convinced that the prose for separating by bracket matching should be dropped as that's definitely not what's happening or needed.

Because @tabatkins provides this very useful reference, as an additional improvement to the current text, we may want to refer to "5.3.10. Parse a comma-separated list of component values" as the algorithm for parsing this value, assuming that it's fully applicable: In that respect, are url(...) and local(...) defined as function tokens?

tabatkins · 2021-06-03T16:24:21Z

Yeah, per the Syntax spec, they definitely get parsed as function tokens. (Depending on its contents, url() might get parsed as a url-token, but for these purposes that's identical.)

A little more precisely, by the time this algo can get called, they're already processed into functions, a very slightly higher-level object that contains all the stuff between their parens, so a pass over the top-level component values won't see any commas inside of there unless you recurse manually.

litherum · 2021-06-09T06:54:19Z

The idea was just that I didn’t want “foo, bar(baz, quux)” to be broken into 3 pieces: “foo” “bar(baz” and “quux)”. If there’s logic elsewhere that already satisfies this desire, I’m happy to use it instead.

WebKit hasn’t implemented parsing support for this new src: descriptor grammar yet.

drott · 2021-06-09T13:46:41Z

Thanks for commenting, @litherum. To summarize, I suggest to improve this section by dropping

The value of the 'src!!descriptor' descriptor is parsed piecewise. First, opening and closing parentheses are matched. If these parentheses cannot be matched, the value is a parse error. Matching the parentheses has the effect of partitioning the value into multiple regions, each of which is either inside outermost parentheses or outside outermost parentheses.

Once the value is partitioned, all the commas outside outermost parentheses are located. The value is then split at these comma locations.

Each item in the split value is then parsed against this grammar:

and instead say:

The src: descriptor value must be parsed according to section "5.3.10. Parse a comma-separated list of component values". Then each component value is parsed according to this grammar:

I also suggest to address "ISSUE 4" and the definition of "thrown out" by saying:

If a component value is parsed correctly and is of a format and font technology that the UA supports, add it to the list of supported sources. If parsing a component value results in a parsing error or its format or technology are unsupported, do not add it to the list of supported sources.

If there are no supported entries at the end of this process, the value for the src: descriptor is a parse error.

…iptor see #6340

svgeesus · 2021-06-09T14:19:47Z

Thanks for the suggested wording, @drott I just added it.

svgeesus · 2021-06-09T14:23:09Z

This issue is still on the agenda for today's CSS call, but unless there are problems with the suggested wording then it looks like we can just close the issue as resolved?

It would be good to have some known-invalid-component src parsing tests in WPT.

drott · 2021-06-09T15:21:06Z

Yes, I support resolving and closing with this change. Unfortunately I have a conflict and can't attend today's call, but at least @lilles said they'd be there from our side.

I plan to add tests in parallel to implementing the supports syntax and mechanism in Chrome. I'll add tests for known-invalid-components as you suggest.

svgeesus · 2021-06-09T16:27:02Z

@jfkthame okay to close this?

svgeesus · 2021-06-09T16:28:22Z

the updated section of the spec

jfkthame · 2021-06-09T16:38:27Z

Yes, sounds OK to me.

tabatkins · 2021-06-09T17:02:25Z

+1 to committed change, but @svgeesus INCONSISTENT TABS AND SPACE INDENTATION???

You could also just link the algorithm directly with [=parse a comma-separated list of component values=], but that's a nit.

svgeesus · 2021-06-10T17:12:04Z

Linking to the algorithm directly might well have been better, yes.

I'm aware that Tabs vs. Spaces is one of the GREAT HOLY WARS OF OUR TIME and I am really not interested in signing up as a combatant.

Adapt font-face src: parsing behavior to more leniently accept a src: line that has parse errors in an individual component. Part of Interop 2023 in the font feature detection and palettes category [1]. Also return serialized values in lowercase to match CSSOM spec. After CSS WG resolution [1] and updated spec text [2] individually failing src: line components for incorrect format() or tech(), or other parse failures per component should not lead to dropping the whole src: line, but only reject the individual component. Implement checks for component sanity and on failure, skip to next comma or end. [1] https://wpt.fyi/results/css/css-fonts/parsing?q=label%3Ainterop-2023-fonts [1] w3c/csswg-drafts#6340 [2] https://drafts.csswg.org/css-fonts-4/#font-face-src-parsing Fixed: 1398933, 857226 Change-Id: Ib94f745fb8b453e21fd981973c29dccb4b722439

@font-face

Adapt font-face src: parsing behavior to more leniently accept a src: line that has parse errors in an individual component. Part of Interop 2023 in the font feature detection and palettes category [1]. Also return serialized values in lowercase to match CSSOM spec. After CSS WG resolution [1] and updated spec text [2] individually failing src: line components for incorrect format() or tech(), or other parse failures per component should not lead to dropping the whole src: line, but only reject the individual component. Implement checks for component sanity and on failure, skip to next comma or end. Test updates: * fast/css/font-face-src-parsing.html updated to account for more cases being allowed. For example, trailing commas in a src: line are not a full parse failure. * fast/css/font-face-unquoted-local.html removed. The test is too restrictive after the behavior change, and relatively trivial and already covered in WPT tests, for example font-format-src-parsing.html. * fast/css/fontfaceset-multiple-families.html updated to account for a @font-face with format("svg") being rejected early and thus not becoming part of the FontFaceSet of the page. * external/wpt/css/css-fonts/format-specifiers-variations.html disabled until review in web-platform-tests/wpt#40666 is complete. Test seems to be expecting an incorrect error type, NetworkError vs. SyntaxError. [1] https://wpt.fyi/results/css/css-fonts/parsing?q=label%3Ainterop-2023-fonts [1] w3c/csswg-drafts#6340 [2] https://drafts.csswg.org/css-fonts-4/#font-face-src-parsing Fixed: 1398933, 857226 Change-Id: Ib94f745fb8b453e21fd981973c29dccb4b722439 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4624052 Reviewed-by: Rune Lillesveen <futhark@chromium.org> Commit-Queue: Dominik Röttsches <drott@chromium.org> Cr-Commit-Position: refs/heads/main@{#1160567}

svgeesus added the css-fonts-4 Current Work label Jun 2, 2021

mozilla-apprentice mentioned this issue Jun 2, 2021

[css-fonts] Drop bracket matching step from @font-face src: line parsing mozilla/wg-decisions#551

Closed

svgeesus self-assigned this Jun 2, 2021

svgeesus added the Agenda+ label Jun 2, 2021

svgeesus added a commit that referenced this issue Jun 9, 2021

[css-fonts-4] Reference syntax spec for parsing comma separated descr…

60dd3ff

…iptor see #6340

svgeesus added the Needs Testcase (WPT) label Jun 9, 2021

svgeesus closed this as completed Jun 9, 2021

svgeesus added Closed Accepted by Editor Discretion Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. and removed Agenda+ labels Jun 9, 2021

This was referenced Aug 9, 2021

Review Request: CSS Fonts src: descriptor syntax for client side font selection w3ctag/design-reviews#666

Closed

supports(<font-technology>)extended syntax for @font-face mozilla/standards-positions#563

Closed

chromium-wpt-export-bot mentioned this issue Jun 19, 2023

Treat component failures leniently in @font-face src: parsing web-platform-tests/wpt#40620

Closed

drott mentioned this issue Jun 21, 2023

[css-fonts] Update expected error in @font-face format test web-platform-tests/wpt#40666

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[css-fonts] Drop bracket matching step from `@font-face` src: line parsing #6340

[css-fonts] Drop bracket matching step from `@font-face` src: line parsing #6340

drott commented Jun 2, 2021

svgeesus commented Jun 2, 2021

svgeesus commented Jun 2, 2021

svgeesus commented Jun 2, 2021

svgeesus commented Jun 2, 2021

jfkthame commented Jun 2, 2021

svgeesus commented Jun 2, 2021

jfkthame commented Jun 2, 2021

tabatkins commented Jun 2, 2021

tabatkins commented Jun 2, 2021 •

edited

Loading

drott commented Jun 3, 2021

tabatkins commented Jun 3, 2021 •

edited

Loading

litherum commented Jun 9, 2021 •

edited

Loading

drott commented Jun 9, 2021

svgeesus commented Jun 9, 2021

svgeesus commented Jun 9, 2021

drott commented Jun 9, 2021 •

edited

Loading

svgeesus commented Jun 9, 2021

svgeesus commented Jun 9, 2021

jfkthame commented Jun 9, 2021

tabatkins commented Jun 9, 2021

svgeesus commented Jun 10, 2021

[css-fonts] Drop bracket matching step from @font-face src: line parsing #6340

[css-fonts] Drop bracket matching step from @font-face src: line parsing #6340

Comments

drott commented Jun 2, 2021

svgeesus commented Jun 2, 2021

svgeesus commented Jun 2, 2021

svgeesus commented Jun 2, 2021

svgeesus commented Jun 2, 2021

jfkthame commented Jun 2, 2021

svgeesus commented Jun 2, 2021

jfkthame commented Jun 2, 2021

tabatkins commented Jun 2, 2021

tabatkins commented Jun 2, 2021 • edited Loading

drott commented Jun 3, 2021

tabatkins commented Jun 3, 2021 • edited Loading

litherum commented Jun 9, 2021 • edited Loading

drott commented Jun 9, 2021

svgeesus commented Jun 9, 2021

svgeesus commented Jun 9, 2021

drott commented Jun 9, 2021 • edited Loading

svgeesus commented Jun 9, 2021

svgeesus commented Jun 9, 2021

jfkthame commented Jun 9, 2021

tabatkins commented Jun 9, 2021

svgeesus commented Jun 10, 2021

[css-fonts] Drop bracket matching step from `@font-face` src: line parsing #6340

[css-fonts] Drop bracket matching step from `@font-face` src: line parsing #6340

tabatkins commented Jun 2, 2021 •

edited

Loading

tabatkins commented Jun 3, 2021 •

edited

Loading

litherum commented Jun 9, 2021 •

edited

Loading

drott commented Jun 9, 2021 •

edited

Loading