-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple-char operators in the Operator Dictionary #143
Comments
My proposal would be
|
I stand corrected, multi-char seems to be implemented in Gecko: https://searchfox.org/mozilla-central/source/layout/mathml/mathfont.properties#72 spacing seems to work but not stretching. Currently, it uses a hash table of strings (see https://bugzilla.mozilla.org/show_bug.cgi?id=1336437) while WebKit uses a sorted table of Unicode code point. cc @emilio |
They are really two flavours of these some with duplicated ascii like Other duplicated operators with a combining character such as the combining negation slash or the variant selector are harder to get rid of as Unicode as a rule would be reluctant to add new pre-composed characters that are equivalent to a combination with a negation slash. The ones that do have pre-composed negations are some arbitrary list based on legacy font encodings (mostly). stretching of multiple character operators is likely to be difficult (pretty much impossible in TeX as well) so you could probably say explicitly that that isn't supported in core (and we could make all multiple character entries have stretchy set to false ? If supporting multiple character entries for spacing is likely to be problematic in core then it would be easy enough (I think) to extract a table for core spec without them and extract something for full spec that says something or adds them back, but of spacing is Ok and just stretchy property is difficult as I say I think we could just make them all stretchy=false even in full. |
note that if you use the entities the multiple character nature is hidden. if you look at greater than, not greater than, much greater than, not much greater than then
>, ≯, ⪢, ≫̸ look like four similar inputs, but the fact that one negation is pre-composed and one made up of a base and combining character is the sort of low level Unicode details that in an ideal world authors would not need to know about. |
This seems reasonable and people will probably expect the multi char to be supported given the past. I'll soon try to implement this since at least for Full it will be in the specification, so WebKit would need to support it anyway. |
My 2 cents:
|
We now support enough of multi-char in chromium that the test passes: |
AFAIK while Unicode has a policy against encoding new pre-composed characters, combining marks that over strike their bases are exempted from this (but they will not be made canonically equivalent to the decomposed form). |
So I didn't comment here, but two weeks ago we agree to keep multi-char support. Rob already fixed our chromium branch and there is https://bugs.webkit.org/show_bug.cgi?id=124828 in webkit. |
Consensus from previous meetings:
|
@NSoiffer @davidcarlisle I still see a log of multiple-char entries with symmetric/stretchy (and fence). Can we remove these properties? |
Yes I agree we shouldn't imply these stretch. Neil have you pending changes, or should I do that? |
Yes, I have some changes pending. I'll remove any stretchy properties from
them. Since symmetric only applies to stretchy chars, I'll make sure those
go too.
Removing "fence" though doesn't make sense unless you are saying you want
to remove that property from MathML ("separator" would then go also). That
would be something to raise in its own issue and something to discuss on a
call.
…On Sat, Apr 11, 2020 at 3:16 AM David Carlisle ***@***.***> wrote:
Yes I agree we shouldn't imply these stretch. Neil have you pending
changes, or should I do that?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#143 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALZM3AH66L6THRQ4SQGVILRMA7OLANCNFSM4IY56Y3Q>
.
|
Thanks.
Yes, that's why I put that one in parenthesis. fence/separator don't have any use for layout so implementers can just ignore them for now anyway, which is probably what we will do in Chromium for now. The question of whether this will be used for browsers' accessibility tree is still open but I'm not aware of any use or plan to use it (they ar exposed by webkit on iOS/macOS but not sure if VoiceOver handles them). It seems there are not many of operators with these properties so there is also the option of handling them separately if they turn out to be necessary. |
I opened #209 for the separate fence/separator discussion. The following entries seem still weird to me, can the spacing be tweaked so that they can be moved to another pre-existing category?
I still see a lot of repeated ASCII characters and I'm not sure how relevant these entries are. I would rather see them in prefformated text, not math layout... |
Multichar entries are now handled by https://mathml-refresh.github.io/mathml-core/#operator-dictionary-compact For the record, current estimated size is 770*2 = 1540 bytes. The cost of supporting multi char entries is quite significant, (154+49)*2 = 406 bytes so 26% of the dictionary size. If some entries are not essential, it would be very good to try and simplify things. For example restricting to 2-char strings would avoid the extra character necessary for nulll-terminated strings. And it looks like ASCII forms are not important at all, they should be replaced with the proper Unicode code point (or people should use preformated text rather than math formulas). I wonder whether we could just restrict to negated XXXX-00338 entries? |
* "|||" does not seem to be used as a programming language operator. * For (stretchy) fences, U+2980 is more appropriate than "|||" w3c/mathml#143 w3c/mathml#176
It seems to be used as a punctuation sign rather than an operator. The ellipsis character … U+2026 seems more appropriate for that purpose. w3c/mathml#143 w3c/mathml#176
For this point, I opened |
This is now a table of 2-char ASCII operators (38 bytes): Operators_2_ascii_chars Text has been changed to handle case of 2-char op with the second character is either U+338 COMBINING LONG SOLIDUS OVERLAY or U+20D2 COMBINING LONG VERTICAL LINE OVERLAY. I'm not sure if there is an easy way in browsers to check for combining characters, and only these two seemed important per yesterday's discussion. But we can change that later if more single char + combining are needed. The two surrogate pairs for Arabic operators are also handled specially. I'm closing this as the tests are already written, they just need to be regenerated. |
cc @rwlbuis
I'm not aware when this was decided, but the operator dictionary contains the following entries with multiple characters:
Currently this is not supported in browsers ( https://bugs.webkit.org/show_bug.cgi?id=124828 ).
We have some tests to check that operators render the same with implicit and without explicit operator properties specified by the dictionary.
Probably they will need to be handled in a separate table, which will make the code a bit more complex/larger. Multiple vertical bars are even stretchy but OpenType only provides per-glyph stretching, which means we will have to add more spec description/test/implementation if we really want to support multi-char stretching.
So I wonder how important all of these are? It seems many of them are just equivalent to a single unicode character (or are waiting for such a character to be introduced in Unicode). Can't people just use explicit lspace/rspace or the equivalent unicode code point? At least there are already code points for double/triple stretchy vertical bars which are stretchy and supported by OpenType fonts / browsers.
The text was updated successfully, but these errors were encountered: