-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accents #2404
Accents #2404
Conversation
… standalone char and for use in math
…g) chars for accents, leveraging data from Util::Unicode module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some perl-oriented comments.
Would need to study accents a bit more before I can say anything substantive on the TeX emulation side of affairs.
if (my $entry = unicode_accent($char)) { | ||
applyAccent($stomach, $letter, $$entry{combiner}, $$entry{standalone}, | ||
Invocation(T_CS('\accent'), $num, $letter)); } | ||
else { # Unknown accent ? Really should OVERLAY it on top of $letter??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a small test for using an unknown accent? Might be useful to keep checking as things move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, put in a simplistic handling for this case, along w/CSS and an extra line in test.
lib/LaTeXML/Util/Unicode.pm
Outdated
@@ -24,6 +23,73 @@ sub UTF { | |||
my ($code) = @_; | |||
return pack('U', $code); } | |||
|
|||
my $NBSP = UTF(0xA0); | |||
sub NBSP { return $NBSP; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this attempting to guard the lexical $NBSP
from some sort of external redefinition?
Surprised it wasn't introduced as a package-level our $NSBP;
declaration. That can be added to @EXPORT
too, if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, switched over to cleaner \N{NBSP}
approach (along with use charnames ':full';
for older perls)
foreach my $entry (@accent_data) { | ||
$accent_data{ $$entry{standalone} } = $entry; | ||
$accent_data{ $$entry{combiner} } = $entry; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is rather dangerous. Could you change the name of either @accent_data
or %accent_data
?
I was thrown for a bit not knowing what happened, seeing the hash indexing syntax in $accent_data{$char}
.
Such a setup is bound to generate a subtle bug one day, when someone uses the wrong delimiters in haste:
my @a = ('x','y','z');
my %a = ('1'=>'a', '2'=>'b', '3'=>'c');
print $a[2]; # z
print $a{2}; # b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the hash variant can be kept, and you can even fully expand the loop that adds standalone
and combiner
in code, to avoid looping the setters every time latexml is ran...
Simplest is to run it once, print with Dumper, then copy the result back into the file. And on second read - aren't standalone
and combiner
already set in the explicit hashes? Maybe the loop is a leftover? Except that the loop sets them as equal, while the hash often has the values as different. Hm, I wish I understood a little more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed a bit to make it less scary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates, the PR looks ready to merge. ✅
I tried another two alternatives for the "unknown accent" markup, visible in this codepen. Not a focus at the moment, but something to mull over in the background. <ruby>
and display: inline-grid;
are curious tools.
* Add Unicode data and accessor for accents, along with combining char, standalone char and for use in math * Update OT1 FontMap to use appropriate 'standalone' chars for accents * Use more consistent model of unicode combining and standalone (spacing) chars for accents, leveraging data from Util::Unicode module * safer lookup * Use new Util::Unicode data to get 'unwrapped' char for over/under operand token * Update use of DefAccent to be consistent with Util::Unicode's better choices * Make keywords avoid clumsy font recoding * Updated tests for better tracking of font encoding * Simplistic handling of \accent (as overlay) when a non-accent is used; add a testcase * Cleaner naming conventions; use \N{NBSP} instead of a var. * HTML/MathML tests no longer need javascript polyfill
This PR is another round in improving the fidelity of lower-level TeX and plain.tex. It deals with accents. Accents in TeX fonts basically show an accented space which gets overlaid on top of a character to be accented (potentially with some positioning tweaks). To deal with Unicode and MathML, we need to distinguish 3 kinds of thing:
So, we've set up a table within
LaTeXML::Util::Unicode
to store this info, and use it wherever accenting operations are needed.