Snuggletex Integration #646

Zylence · 2023-05-17T15:16:19Z

Snuggletex

Snuggletex is a library from the University of Edinburgh for converting latex to XML, but can be used for latex parsing as well. It is extendible, easy to use and powerful, all whilst containing almost no external dependencies.
In the future, it could become our main latex parser for integrity checks.

What it does do

^
_
$
commands: \command[opt]{arg}{...}
verbs: \verb(*)...
environments: \begin{env}[opt]{arg}{...}...\end{env}
user-defined commands

Takeaways

Some commands may be missing, for example I found \text{} to be absent,
to check which commands are supported by default, refer here: CorePackageDefinitions.java
Thankfully, enough of the package is exposed to be able to inject new commands, like so for example:
engine.getPackages().get(0).addComplexCommandOneArg("text", false, ALL_MODES,LR, StyleDeclarationInterpretation.NORMALSIZE, null, TextFlowContext.ALLOW_INLINE);

I have not checked if this is the correct way to represent the text command, but now it parses it correctly.

What it does not do

&
#
%

What we could do

Use the tokens provided by snuggletex to implement our own parser on top

Or
Keep our integrity checks for & and # and implement % like we used to

What I would like to do

I am really fascinated by this library, it's clean, well documented and build thoughfully and extendible. I'd really like to do more with it. If you do not mind, I'd like to port our integrity checks to snuggletex, rather than writing them as we used to.

Mandatory checks

Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

…ult error messages

…ties file

…es the jabref properties for translations and excludes dom building errors

…to pass the error arguments to our own localization

…ef localization

…hanges to make the linter happy

Co-authored-by: Christoph <siedlerkiller@gmail.com>

koppor · 2023-06-12T22:25:23Z

In JabRef's localization, we keep the English keys equal to the English full text. In this way, we can have readabile code. Details at https://devdocs.jabref.org/code-howtos/localization.html.

Regarding the SnuggleTeX errors, it would be OK to show "LaTeX cannot be parsed" and have the detail message in English only. These are all very technical terms and I thnk, no JabRef tranlsator should be bothered to translate them.

koppor · 2023-06-12T22:27:38Z

To workaround the method-too-large exception, I am trying to include our custom JDK build.

koppor

Code looks pretty readable.

Please add test cases - and then this should be good to go for a review at the jabref repo

CHANGELOG.md

src/main/java/org/jabref/logic/integrity/LatexIntegrityChecker.java

koppor · 2023-06-12T22:40:04Z

Thank you for having worked on this. Sorry for the late reply. Hope, you will continue on this!

Some commands may be missing, for example I found \text{} to be absent,

\text appears in math-mode only. Not outside math mode. Maybe, it is available inside math mode (e..g, $\text{test}$).

I have not checked if this is the correct way to represent the text command, but now it parses it correctly.

Which concrete case did you have?

What it does not do

Please unzip what you mean.

Ah, I understand: It does not check for ampersands, wrong bibtex string concatenation, and percent signs.

For BibTeX string concatenation, we already discussed. Difficult thing as this is JabRef custom logic, too.

What we could do

1. Use the tokens provided by snuggletex to implement our own parser on top
   **Or**

Could be a fun excercise? With the possible outcome that the code is harder to read as our existing code?

What I would like to do

I am really fascinated by this library, it's clean, well documented and build thoughfully and extendible. I'd really like to do more with it. If you do not mind, I'd like to port our integrity checks to snuggletex, rather than writing them as we used to.

Suggestion: Finish this PR with the current functionality. - Create a new branch (based on this branch) to rewrite the other checks. In this way, you can work in parallel. If the updated code turns out to be good, we can continue working forward to include it. If it turns out it is not that maintainable, the issue JabRef#8712 is still fixed.

Zylence · 2023-06-13T22:02:13Z

In JabRef's localization, we keep the English keys equal to the English full text. In this way, we can have readabile code. Details at https://devdocs.jabref.org/code-howtos/localization.html.

Regarding the SnuggleTeX errors, it would be OK to show "LaTeX cannot be parsed" and have the detail message in English only. These are all very technical terms and I thnk, no JabRef tranlsator should be bothered to translate them.

Okay, just to make sure I understand you correctly, a message would for example look like this:

LaTeX cannot be parsed: Nothing following \

And we would only translate this part: LaTeX cannot be parsed.

So having Nothing following \ inside the properties file is no necessary?

Consequently we would only need one entry inside the properties file:
LaTeX\ cannot\ be\ parsed:=LaTeX cannot be parsed: %0

and "feed it" the internal error messages.

Zylence · 2023-06-13T22:02:36Z

To workaround the method-too-large exception, I am trying to include our custom JDK build.

Sounds great, I am looking forward to trying it out :^)

Zylence · 2023-06-13T22:03:29Z

Thank you for having worked on this. Sorry for the late reply. Hope, you will continue on this!

No worries, I was very busy anyway. I would be happy to continue :^) .

\text appears in math-mode only. Not outside math mode. Maybe, it is available inside math mode (e..g, ).

It really is not handled. I looked through the library with text search and tried it out. \text is an undefined command according to snuggletex.

Ah, I understand: It does not check for ampersands, wrong bibtex string concatenation, and percent signs.

Exactly, could have made this clearer, sorry for the confusion.

Could be a fun excercise? With the possible outcome that the code is harder to read as our existing code?

Just guessing, I'd say it won't make much of a difference. Perhaps we should just be happy with what we already have. I will do a case study regardless, so we can see the possible outcome before deciding.

Suggestion: Finish this PR with the current functionality. - Create a new branch (based on this branch) to rewrite the other checks. In this way, you can work in parallel. If the updated code turns out to be good, we can continue working forward to include it. If it turns out it is not that maintainable, the issue https://github.com/JabRef/jabref/issues/8712 is still fixed.

Sounds good to me, I'll do it. :)

koppor · 2023-06-14T07:49:06Z

Okay, just to make sure I understand you correctly, a message would for example look like this:

LaTeX cannot be parsed: Nothing following \

And we would only translate this part: LaTeX cannot be parsed.

So having Nothing following \ inside the properties file is no necessary?

Yes. We should move forward in that way. Later, we should collect (somehow) the returned errors and translate them. We have telemetry infrastructure for that, but currently not working. Needs some updates...

Consequently we would only need one entry inside the properties file:
LaTeX\ cannot\ be\ parsed:=LaTeX cannot be parsed: %0

The %0 needs to be on the left side, too. Key and value really need to be the same 😅

and "feed it" the internal error messages.

Yes!

koppor · 2023-06-14T07:51:11Z

Thank you for having worked on this. Sorry for the late reply. Hope, you will continue on this!

No worries, I was very busy anyway. I would be happy to continue :^) .

Looking forward 🤩

\text appears in math-mode only. Not outside math mode. Maybe, it is available inside math mode (e..g, ).
It really is not handled. I looked through the library with text search and tried it out. \text is an undefined command according to snuggletex...

Maybe worth a PR for SnuggleTex? 😅

Could be a fun excercise? With the possible outcome that the code is harder to read as our existing code?
Just guessing, I'd say it won't make much of a difference. Perhaps we should just be happy with what we already have. I will do a case study regardless, so we can see the possible outcome before deciding.

👍

Localization now uses a string from jabrefs properties to wrap the internal error messages. Local fields have been made static members in order to improve performance with large bib files. We no longer instanciate an Engine and Session per bib entry.

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>

Zylence · 2023-06-18T21:52:47Z

Later, we should collect (somehow) the returned errors and translate them. We have telemetry infrastructure for that, but currently not working. Needs some updates...

Do you mean to translate them automaticly, via a service?

The %0 needs to be on the left side, too. Key and value really need to be the same 😅

Slip of the pen. 😅

Yes!

Errors are now prefixed with "LaTeX Parsing Error:" I found that quite appealing, but we can change that back to "LaTeX cannot be parsed" if you like.

Zylence · 2023-06-18T21:53:19Z

Maybe worth a PR for SnuggleTex? 😅

I can try, but do you think it will be merged? I saw your PR in the repo is still dangling around unmerged. 😅

Zylence · 2023-06-18T22:00:27Z

Please add test cases - and then this should be good to go for a review at the jabref repo

Tests will be ready over the next week. :)

Code looks pretty readable.

Thanks, just did some performance oriented refactoring regardless, but that should not have too big of an effect on readability. Engine and Session are now static members, prior to that we instantiated them per bib entry. (Keeping the references aroung only adds ~1 KB of memory overhead)

koppor · 2023-06-19T09:23:23Z

Later, we should collect (somehow) the returned errors and translate them. We have telemetry infrastructure for that, but currently not working. Needs some updates...
Do you mean to translate them automaticly, via a service?

No ^^. Here my line of thought:

The number error messages is greater than ten.
We do not know if we need to translate them all.
How to find out which messages should be translated?
Idea: Use telemtry! Collect the send error messages centrally.
After some time, we know the set of tuples (error message, number of occurrence)
These can then be translated.

The translation will be done as usual using JabRef_en.properties.

Errors are now prefixed with "LaTeX Parsing Error:" I found that quite appealing,

Reas good!

koppor · 2023-06-19T09:27:01Z

Maybe worth a PR for SnuggleTex? 😅
I can try, but do you think it will be merged? I saw your PR in the repo is still dangling around unmerged. 😅

@davemckain is the original developer on snuggletex. My bet is that he is happy if several people contribute to his repository - https://github.com/davemckain/snuggletex.

Note to self: The other "maintained" fork seems to be https://github.com/rototor/snuggletex. I would, however, like to stick to the "original" one ^^.

…therefore for its encapsulated SnuggleTex Parser). Further, slightly adjusted the LatexIntegrityChecker to expose a static errorMessageFormatHelper method to increase maintainability.

Zylence · 2023-07-25T18:43:01Z

Code looks pretty readable.

Please add test cases - and then this should be good to go for a review at the jabref repo

Test cases have been added in this commit. Some that I expected to work did not (due to snuggletex) these are commented out for now.

Zylence · 2023-07-25T18:47:02Z

Code looks pretty readable.
Please add test cases - and then this should be good to go for a review at the jabref repo

Test cases have been added in this commit. Some that I expected to work did not (due to snuggletex) these are commented out for now.

Well that was unfortunate. I accidentally closed the PR in this comment, sorry.

Zylence · 2023-07-25T18:52:46Z

Later, we should collect (somehow) the returned errors and translate them. We have telemetry infrastructure for that, but currently not working. Needs some updates...
Do you mean to translate them automaticly, via a service?

No ^^. Here my line of thought:

The number error messages is greater than ten.

We do not know if we need to translate them all.

How to find out which messages should be translated?

Idea: Use telemtry! Collect the send error messages centrally.

After some time, we know the set of tuples (error message, number of occurrence)

These can then be translated.

The translation will be done as usual using JabRef_en.properties.

Errors are now prefixed with "LaTeX Parsing Error:" I found that quite appealing,

Reas good!

Oh, okay, now I get it - I did not know you had telemetry infrastructure for that kind of thing. Thanks for letting me know. I agree, narrowing down the selection of messages that need translation in advance is the better choice.

koppor · 2023-09-07T20:52:09Z

This should now be a PR to JabRef's main repo. I resolved the merge conflicts at 4fb512e.

Should go as one commit in the upstream repo - if possible.

Zylence · 2023-09-08T21:40:50Z

This should now be a PR to JabRef's main repo. I resolved the merge conflicts at 4fb512e.

Should go as one commit in the upstream repo - if possible.

Thank you, I am happy to hear that. I am excited to see how it will do in the wild! :d

koppor · 2023-09-12T13:19:01Z

Submitted JabRef#10376 - therefore closing this.

Zylence added 7 commits May 13, 2023 18:26

Added snuggletex and dependencies

fe4e868

Added basic integrity latex integrity check using snuggletex and defa…

9ed75f1

…ult error messages

Copied english translation from snuggletex library to internal proper…

0966619

…ties file

LatexIntegrityChecker is now invoked on IntegrityCheck, it further us…

07e4fc0

…es the jabref properties for translations and excludes dom building errors

Latex parsing now fails on the first error - further we are now able …

a3b507d

…to pass the error arguments to our own localization

Adjusted former snuggletex property entry arguments to work with jabr…

8cda869

…ef localization

Added error exclusions for snuggletex and a changelog entry + style c…

bf882c0

…hanges to make the linter happy

Zylence mentioned this pull request May 17, 2023

New quality check and cleanup for & #585

Closed

2 tasks

Siedlerchr and others added 13 commits May 18, 2023 12:14

Merge branch 'main' into feature-snuggletex-integration

f4055f1

Use JabRef's JDK21 build

3cd5071

Co-authored-by: Christoph <siedlerkiller@gmail.com>

New image

c49f6c5

Fix tests

6e2ce07

Fix JDK version

d35b3b9

Try to fix usage of JDK

d9b4a5c

Try to fix setup

bf4b7fb

Add initial Dockerfile to build JDK

f0d66ca

Use sdkman for building

7c648d3

Merge remote-tracking branch 'upstream/main' into use-jdk21

b84dcf3

Use JabRef's JDK for linux and macOS

14c2fef

Fix sed command

3e17de7

Merge branch 'main' into feature-snuggletex-integration

afb3248

Merge branch 'use-jdk21' into feature-snuggletex-integration

fd1a545

koppor requested changes Jun 12, 2023

View reviewed changes

Zylence and others added 3 commits June 18, 2023 18:06

Merge branch 'JabRef:main' into feature-snuggletex-integration

8103384

Update CHANGELOG.md

cd22019

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>

Zylence and others added 4 commits June 25, 2023 01:04

Added positive and negative tests for the LatexIntegrityChecker (and …

02ebbe8

…therefore for its encapsulated SnuggleTex Parser). Further, slightly adjusted the LatexIntegrityChecker to expose a static errorMessageFormatHelper method to increase maintainability.

Updated with main branch

b5dc8e8

Added requested comment explaining why we only care for the first error

8744b86

Update journal abbreviation lists

05adf55

Zylence closed this Jul 25, 2023

Zylence reopened this Jul 25, 2023

Zylence marked this pull request as ready for review July 25, 2023 19:23

koppor mentioned this pull request Sep 12, 2023

Adds LaTeX integrity check based on SnuggleTeX JabRef/jabref#10376

Merged

6 tasks

koppor closed this Sep 12, 2023

Zylence mentioned this pull request Sep 21, 2023

Add integrity check for LaTeX special characters JabRef/jabref#8712

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snuggletex Integration #646

Snuggletex Integration #646

Zylence commented May 17, 2023 •

edited by calixtus

Loading

koppor commented Jun 12, 2023

koppor commented Jun 12, 2023

koppor left a comment

koppor commented Jun 12, 2023

What it does not do

What we could do

What I would like to do

Zylence commented Jun 13, 2023

Zylence commented Jun 13, 2023

Zylence commented Jun 13, 2023

koppor commented Jun 14, 2023

koppor commented Jun 14, 2023

Zylence commented Jun 18, 2023 •

edited

Loading

Zylence commented Jun 18, 2023

Zylence commented Jun 18, 2023

koppor commented Jun 19, 2023

koppor commented Jun 19, 2023

Zylence commented Jul 25, 2023

Zylence commented Jul 25, 2023 •

edited

Loading

Zylence commented Jul 25, 2023

koppor commented Sep 7, 2023

Zylence commented Sep 8, 2023

koppor commented Sep 12, 2023

Snuggletex Integration #646

Snuggletex Integration #646

Conversation

Zylence commented May 17, 2023 • edited by calixtus Loading

Snuggletex

What it does do

Takeaways

What it does not do

What we could do

What I would like to do

Mandatory checks

koppor commented Jun 12, 2023

koppor commented Jun 12, 2023

koppor left a comment

Choose a reason for hiding this comment

koppor commented Jun 12, 2023

What it does not do

What we could do

What I would like to do

Zylence commented Jun 13, 2023

Zylence commented Jun 13, 2023

Zylence commented Jun 13, 2023

koppor commented Jun 14, 2023

koppor commented Jun 14, 2023

Zylence commented Jun 18, 2023 • edited Loading

Zylence commented Jun 18, 2023

Zylence commented Jun 18, 2023

koppor commented Jun 19, 2023

koppor commented Jun 19, 2023

Zylence commented Jul 25, 2023

Zylence commented Jul 25, 2023 • edited Loading

Zylence commented Jul 25, 2023

koppor commented Sep 7, 2023

Zylence commented Sep 8, 2023

koppor commented Sep 12, 2023

Zylence commented May 17, 2023 •

edited by calixtus

Loading

Zylence commented Jun 18, 2023 •

edited

Loading

Zylence commented Jul 25, 2023 •

edited

Loading