Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update \operatorname to work more like in LaTeX. (mathjax/MathJax#2830) #788

Merged
merged 2 commits into from
Apr 18, 2022

Conversation

dpvc
Copy link
Member

@dpvc dpvc commented Feb 23, 2022

This PR improves the implementation of \operatorname (and \DeclareMathOperator) to work more like in true LaTeX. In the past, the macro replaced - and * by \text{-} and \text{*} in order to handle what LaTeX dues using \catcode. Here, we take advantage of features added to ParseMethods.variable to get \mathrm{} to produce multi-character mi elements to get \operatorname to do the same. The multiLetterIdentifiers environment value has been replaced by a regular expression for the characters to use (and EnvProp is extended to allow that). Then we make a new map that matches - and * and checks whether we are in an \operatorname, and if so, parses the characters like variables, otherwise parses them as usual.

The \operatorname macro is rewritten to parse the arguments itself, rather than using a replacement macro internally, so that we can set the multiLetterIdentifiers regular expression and the operatorLetters environment property. This also means we can handle the following \limits directly rather than needing the extra \SkipLimits macro to do that.

Finally, \DeclareMathOperator is set up to define the macro in terms of \operatorname rather than \mathop.

Resolves issue mathjax/MathJax#2830

@dpvc dpvc requested a review from zorkow February 23, 2022 17:49
@dpvc
Copy link
Member Author

dpvc commented Feb 23, 2022

PS, this PR is to the html-in-tex branch because it needs to fixes to the TeX parser that allows returning false to continue looking through the symbol maps. I could try to factor that out into a separate PR, but that would require rebasing the html-in-mml and html-in-tex branches, and I didn't want to do that in case you have already been commenting on them.

@dpvc dpvc added this to the 3.2.1 milestone Apr 6, 2022
@zorkow
Copy link
Member

zorkow commented Apr 8, 2022

PS, this PR is to the html-in-tex branch because it needs to fixes to the TeX parser that allows returning false to continue looking through the symbol maps. I could try to factor that out into a separate PR, but that would require rebasing the html-in-mml and html-in-tex branches, and I didn't want to do that in case you have already been commenting on them.

No worries, I think it is easier to review like that.

Copy link
Member

@zorkow zorkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, but I have one major issue with the PR:
The extension of the EnvProp type to include a non-primitive type. Can't we pass the multiLetterIdentfiers as a string from which the RegExp is created? In addition, can we spell out the regular expression. I find the i modifier very dodgy as it is implementation dependent what it does on non-ASCII characters representing letters. As an example

         'a𝐚A𝐀'.match(/[a𝐚]+/)

vs

         'a𝐚A𝐀'.match(/[a𝐚]+/i)

The latter leads to very odd behaviour.

c = parser.string.substr(parser.i - 1).match(/^[a-z]+/i)[0];
const env = parser.stack.env;
if (env.multiLetterIdentifiers && env.font !== '') {
c = parser.string.substr(parser.i - 1).match(env.multiLetterIdentifiers as RegExp)[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiLetterIdentifiers could be a non-empty string.

@@ -28,7 +28,7 @@ import TexError from './TexError.js';
import StackItemFactory from './StackItemFactory.js';

// Union types for abbreviation.
export type EnvProp = string | number | boolean;
export type EnvProp = string | number | boolean | RegExp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that. I'd like to keep that to primitive types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than aesthetic reasons, is there a practical advantage to that? I'm not sure why the environment properties can't be any type of data that is needed.

@dpvc
Copy link
Member Author

dpvc commented Apr 8, 2022

Can't we pass the multiLetterIdentfiers as a string from which the RegExp is created?

We can, but that means the RegExp has to be re-compiled every time it is used, whereas with a constant RegExp, it is only compiled once, so that is more efficient. That was the main reason to do it this way.

In addition, can we spell out the regular expression.

I'm not sure what that means.

I find the i modifier very dodgy as it is implementation dependent what it does on non-ASCII characters representing letters

But the examples you give aren't the RegExp being used. That is just /^[a-z]+/i, which doesn't have those issues, as it only specifies Ascii characters. In any case, if you want to handle Unicode better, you would want to use u in addition to i. I can add the u if you like.

@zorkow
Copy link
Member

zorkow commented Apr 16, 2022

As per f2f I am happy with extending the type by RegExp.

In addition, can we spell out the regular expression.

I'm not sure what that means.

For the record: That just means I prefer /a-zA-Z/ over /a-z/,i.
I generally prefer to avoid i as it makes the meaning of regexps unclear.
If u is now consistently implemented everywhere, that is helpful, although I really believe there should not even be a need for u.

@dpvc
Copy link
Member Author

dpvc commented Apr 18, 2022

Thank for the clarification. I'm adjusting the RegExp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants