-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced macro support and magic \dots #794
Conversation
@@ -75,14 +75,14 @@ const tokenRegex = new RegExp( | |||
"([ \r\n\t]+)|" + // whitespace | |||
"([!-\\[\\]-\u2027\u202A-\uD7FF\uF900-\uFFFF]" + // single codepoint | |||
"|[\uD800-\uDBFF][\uDC00-\uDFFF]" + // surrogate pair | |||
"|\\\\(?:[a-zA-Z]+|[^\uD800-\uDFFF])" + // function name | |||
"|\\\\(?:[a-zA-Z@]+|[^\uD800-\uDFFF])" + // function name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be dangerous. @
is not actually a valid character for macro nams in default TeX. It only works after \makeatletter
, and is intended for internal macros that users don't type.
Overall it is probably safe to include @
by default, or wait for complaints before trying a complete fix; the \@
macro is rare in math.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have at least one negative reaction -- I was curious what others would think. I see three alternatives:
- Add a
Lexer
option for whether to include@
as a letter. Then we could actually support\makeatletter
and\makeatother
(presumably this would need to be carried in theSettings
object...). - Have the
cdots
macro return manually parsedToken
s instead of strings. Then we don't need to touch theLexer
. - Rename
\@cdots
to something else like\cdotsINTERNAL
or\latexcdots
.
Thoughts/preferences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally, the regex change won't affect @{...}
support in tabular
(the only use I know of @
as an active char), nor will it affect the most typical usage of \@
, namely, \@.
(not that KaTeX currently supports \@
). What it would break is if you used \@
immediately followed by a letter, which ... I can't imagine doing. So the other option is to leave this change as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is where I came down and for a similar reason: keep @
as is in this commit.
// \let\DOTSX\relax | ||
defineMacro("\\DOTSI", "\\relax"); | ||
defineMacro("\\DOTSB", "\\relax"); | ||
defineMacro("\\DOTSX", "\\relax"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amsmath defines these "null macros" for other macros to indicate the behavior of \dots
preceding them. Indeed, I should have added use of these macros to \iff
, \implies
, and \impliedby
. See new commit.
I'm going to try to get this reviewed over the next couple of days. |
Thanks @kevinbarabash! I just rebased to fix a conflict with the CJK tweak on |
|
||
it("should consume spaces after macro", function() { | ||
compareParseTree("\\text{\\foo }", "\\text{x}", {"\\foo": "x"}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests. I like how these call out differences in behavior between math mode and text mode.
compareParseTree("\\text{\\foo 1 2}", "\\text{12end}", {"\\foo": "#1#2end"}); | ||
compareParseTree("\\text{\\foo {1} {2}}", "\\text{12end}", {"\\foo": "#1#2end"}); | ||
}); | ||
|
||
it("should allow for multiple expansion", function() { | ||
compareParseTree("1\\foo2", "1aa2", { | ||
"\\foo": "\\bar\\bar", | ||
"\\bar": "a", | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary for this PR, but it might be good to have some tests that verify things about multiple expansions involving arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fun, I added such a test in 56d8900
src/MacroExpander.js
Outdated
* Recursively expand first token, then return first non-expandable token. | ||
* Return the next unexpanded token without removing anything from the | ||
* stack. Similar in behavior to TeX's `\futurelet`. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the term unexpanded token
a little confusing, because it's possible that future
gets called sometime after expandOnce
gets called in which case there may be tokens on the stack that were the result of expanding a token. Maybe stating this in the following way:
Returns the topmost token on the stack, without expanding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice rewrite. Implemented.
src/MacroExpander.js
Outdated
|
||
/** | ||
* Expand next token only once, and leave it on the stack. | ||
* Returns the token or its expansion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment could be a bit clearer, maybe:
Expand the next token only once if possible.
If the token is expanded, the resulting tokens will be push
on to the stack in reverse order and will be return in as an
array, also in reverse order.
If not, the next token will be return without removing it from
the stack.
As a result, as long as there are tokens on the stack, the next
token is at the top of the stack.
Feel free to wordsmith this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised accordingly.
src/MacroExpander.js
Outdated
expandOnce() { | ||
const topToken = this.popToken(); | ||
const name = topToken.text; | ||
const macro = (name.charAt(0) === "\\"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macro
=> isMacro
to more clearly indicate that it's a boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
let expansion = this.macros[name]; | ||
if (typeof expansion === "function") { | ||
expansion = expansion.call(this); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If expansion
is a function then it must always return a string
with valid TeX
code which can contain argument placeholders, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, it could also return an Array
that's a reverse-ordered list of tokens. This was functionality from before that I'm preserving, though I'm not sure we'd want to use it exactly that way. (In particular, the reverse-order aspect is weird.) But it is a nice way for a macro to be able to force strange parsing.
src/MacroExpander.js
Outdated
if (!(macro && this.macros.hasOwnProperty(name))) { | ||
// Fully expanded | ||
this.stack.push(topToken); | ||
return topToken; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little weird to pop topToken
at the start just to push it back on. I can't think of demonstrably better way to do this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, unfortunately on both counts. If I didn't have to consumeSpaces
, I could use future()
and then popToken
after the fully expanded case. Maybe there's another place to put the consumeSpaces
... but at least this works.
src/MacroExpander.js
Outdated
/** | ||
* Expand the next token once (ignoring initial spaces like `get`), | ||
* without removing anything from the stack, and return the top token | ||
* on the stack. Similar in behavior to TeX's `\expandafter\futurelet`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it ignore initial spaces if it calls expandOnce
right away which only ignores trailing spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment was out-of-date. Updated.
// expandOnce returns Token if and only if it's fully expanded. | ||
if (expanded instanceof Token) { | ||
// \relax stops the expansion, but shouldn't get returned (a | ||
// null return value couldn't get implemented as a function). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what a null return value couldn't get implemented as a function
means. Could you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions (as defined by defineFunction
) return a ParseNode
or an object that gets converted into a ParseNode
by the parser. It's not possible for a function to return null
that would just disappear in the final parse tree. Hence it would be impossible to implement a \relax
that expands to nothing (which is different to how {}
expands).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we return null
the parser would think that it's an object an convert it to a ParseNode
. I wonder when would we ever want to write a function that returns a null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long-term, I vaguely hope we can transition functions over to macros, as that's what TeX really does. So maybe not too critical... though certainly could be done by tweaking group parsing.
src/macros.js
Outdated
@@ -3,6 +3,9 @@ | |||
* This can be used to define some commands in terms of others. | |||
*/ | |||
|
|||
import symbols from "./symbols"; | |||
import utils from "./utils"; | |||
|
|||
// This function might one day accept additional argument and do more things. | |||
function defineMacro(name, body) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we get flow in place we'll be able to type this as:
function defineMacro(name: string, body: string | () => string) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
Sorry for the delay. Finally went through and implemented all the comments. Thanks for the review!! Incidentally, this PR also fixes a bug in master with |
// This function might one day accept additional argument and do more things. | ||
function defineMacro(name, body) { | ||
function defineMacro(name: string, body: string | () => string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Mind adding // @flow
at the top of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, indeed! Clearly still a flow noob...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too, for all that's worth. ;)
This is a re-implementation of #599 (
\dots
support) but extending the MacroExpander as necessary to build it as a macro instead of a function, following @gagern's outline.Along the way, we get some nice new macro features:
defineMacro
) can now be JavaScript functions, withthis
bound to theMacroExpander
. (Note the slightly different interface from functions, which get a complex context object -- up for discussion!) This lets us extend theMacroExpander
in more powerful ways, and use the following features:future()
returns the next token without expansion (similar to\futurelet
but without the "let" aspect of assigning to a variation)expandAfterFuture()
returns the next token after one level of expansion (equivalent to a careful use of\expandafter
and\futurelet
)\relax
is now implemented! It stops expansion, but doesn't actually get returned from theMacroExpander
.We also get fixes to old KaTeX bugs (with new tests to confirm, some of which used to fail):
\text{\textellipsis !}
used to include a space in the middle. The trouble is that we were only eating spaces when in math mode, but in text mode we need to be more careful to eat the spaces that make up part of the command name. This is now consumed during macro expansion, even though it's not technically a macro expansion (\textellipsis
is an unexpandable character).\text{\foo }
where\foo
is defined to something -- we got a space, but shouldn't have.\def\foo#1{(#1)}\foo\bar
should expand to(\bar)
and then expand\bar
, whereas the old macro expander would first expand\bar
and then use its tokens for arguments. For example, this didn't work if\def\bar{}
or\def\bar{ }
(especially the latter because of bad space ignoring). Also, if\foo
is a multiargument function,\bar
should constitute one argument instead of one per token. (I did a bunch of testing in LaTeX to confirm this behavior.)A few other miscellaneous changes:
Lexer.js
optionally exportsToken
, so that I can doinstanceof
tests.consumeSpaces()
irreversibly consumes space tokens. This is used to fix the first bug listed above.nextToken()
got split intoexpandOnce()
andexpandNextToken()
. So, e.g.,expandAfterFuture()
is equivalent toexpandOnce()
and thenfuture()
. (This makes for some awkward diffs -- sorry!)\cdots@
, because amsmath defines a more complex\cdots
in terms of this symbol.@
is now considered a valid character for commands, as if\makeatletter
is in effect. I don't think we currently define\@
but if we did this change would be annoying.\dots
,\cdots
, etc. correctly.Here's a texcmp confirming I got at least the tested cases right: