-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LaTeX to Unicode formatter should not replace \%
with %
#8490
Comments
Well, technically this is the correct behavior, it converts everything to Unicode. What you probably want is to use the LaTeXCleanup formatter as well. That respects those things |
Technically correct but practically wrong. LaTeXCleanup will fix the issue with |
Thinking about this a little, the way forward might indeed be to transform the $ sign to \$ when using the Latexcleanup action. The code that would need to be changed is here: https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/logic/formatter/bibtexfields/LatexCleanupFormatter.java |
LaTeXtoUnicode: The LatextoUnicode converter assumes the bibliographic data is formatted in Latex Syntax. In LaTeX syntax, writing the percentage sign requires a backslash in front ( Hence, the removal of a simple backslash From this we can see that:
LaTeX Cleanup: Furthermore, the "LaTeXcleanup" turns out to be slightly a Frankenstein. https://docs.jabref.org/finding-sorting-and-cleaning-entries/saveactions#latex-cleanup. The name is misleading. It does not only clean up redundant LaTeX code or special characters. It actually mostly does the opposite: It makes bibliographies ready to be used with LaTeX (by removing characters, though) I would recommend a name change or at least link to the documentation page for this command within Jabref. E.g. something to Examples:
Interestingly, I just did a test. Running the Jason, maybe you still had your LaTeXtoUnicode cleanup running before or after you used the LaTeX Cleanup action? Maybe you actually had math-mode stuff in the library? Fun fact: Searching on google scholar for % or $ yields 0 results. After having written all this, I still am of the opinion that the way forward would be to change the UnicodeToLaTeX: Doing a similar test for |
Since this is the case, I assume you should have no problems anymore. Closing this. If you still have problems, feel free to open again and report them. |
@JasonGross The next release of JabRef will contain a separate cleanup action that excapes $ signs. Please do not use it lightly. Use with care. JabRef is not able to know if dollar signs were present to A) start mathmode or B) simply render a $ sign. Using this cleanup action will require a double check by users, unless you want to challenge your "luck". |
I am still interested in a cleanup action that converts LaTeX to mixed LaTeX and Unicode, ie, it should be valid LaTeX code and display the same, but anything that could be replaced by a non-special Unicode character is. As I've said above, the current behavior of LaTeX to Unicode is useless because it generates invalid bibliographic files. Should I open a new issue for this, or reopen this one? |
I would propose trying to fix this via an integrity check. #8712 |
The problem is, somebody would need to do the mapping from LaTeX to "Unicode aware LaTeX" or since we are at it from Unicode to "LaTeX aware Unicode", which is a lot of work. The Comprehensive LATEX Symbol List lists
A conversion (e.g. via cleanup actions) is non-trivial. |
That would be great! However, even better would be a version of LaTeX to Unicode that lets the user explicitly deactivate any subset of the mapping that they'd like. The default exclusion list would just include special/control characters like
This is a red herring. If the symbol is not available in the font, it doesn't matter whether it comes from a Unicode character or not. If the symbol is available via command and you're using a Unicode-aware TeX engine, I expect it to be available by Unicode character too. |
Ok, I finally may understand why this might be useful. If you want to bring really old databases up to date and transform to unicode, but not for the sake of using the database to export to LibreOffice/OpenOffice or Microsoft Office (These would be fine with "pure" Unicode I think), but still would want to continue to export them to a (La)TeX engine (that can read unicode), you would only need to do ONE conversion (with some excluded terms), instead of TWO conversions + integrity check. You would not need to check all entries via "integrity check", because the terms you excluded were already working fine with LaTeX before the conversion. Suggestion to change the name of this issue to: "Add cleanup action for "LaTeX to LaTeX aware Unicode"". Have you tried what Christoph suggest by the way? Using "Latex cleanup"? Have you run into problems with it? It does:
|
\%
with %
Yes! (Though more often it's "I copy-pasted from Google Scholar or some internet-provided .bib file" than "I had a really old database".)
Name changed, please reopen issue.
I have not tried it yet. I'll try it the next time I'm manipulating databases. |
@JasonGross lets rename this issue back to "LaTeX to Unicode formatter should not replace % with %" again, then we close this issue and open a new issue with a well explained first post understandable for people that have no clue about these issues listing:
E.g., you can copy paste following text: Problem:
Desired Solution:
Example workflow:
"Special Symbols" that would need to be excluded from conversion:
Additional Information
|
\%
with %
JabRef version
5.5 (latest release)
Operating system
Windows
Details on version and operating system
Windows 10
Checked with the latest development build
Steps to reproduce the behaviour
abstract = {10%}
Since
%
is a comment character in LaTeX, this change is incorrect. More generally, escaped special characters in LaTeX should not be unescaped when converting to Unicode (or at least the general "convert to Unicode" should not have this behavior)Appendix
The text was updated successfully, but these errors were encountered: