-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse units correctly #42
base: main
Are you sure you want to change the base?
Conversation
Note you have to put "Fixes..." into the original comment of this PR for it to work properly. |
I converted this PR to a draft. Please open it once the pipeline is passing and once this PR is ready for my review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, nice job and it works like a charm. Some comments.
tests/playground.py
Outdated
@@ -34,6 +34,11 @@ | |||
|
|||
wiz.res("a911", 1.05, r"\mm\s\per\N\kg") | |||
# wiz.res("a911", "1.052", 0.25, r"\mm\s\per\N\kg") | |||
wiz.res("a911_2", 1.05, r"\mm\s\per(\N\kg)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss what we want to do with parentheses added by the user. In siunitx, the following does not error and will produce a PDF:
While this does not make sense semantically-wise, I think we should leave that decision to the user, i.e. allow semantically incorrect statements. Otherwise, we'd have to do some more effort to implement a more sophisticated lexer/parser, e.g. see something along the lines of this and this.
Should you agree on this, then we need to make sure that:
\mm\s\per(\N\kg)
is parsed correctly (right now the\per(
is one token, so it's not recognized correctly)- our own adding of parentheses doesn't get in the way with user input. For example, what should we do with this unit string:
\mm\s \per\N)\kg
? siunitx just prints it like this:mms/N)kg
whereas\mm\s \per\N\kg
becomesmm s kg/N
. Note that in the latter case, you can see that siunitx parsed it correctly and also added spaces between units likemm
ands
. In the first case, it seems to just do very basic things like replacing\per
by/
and removing whitespaces. Even in the case\mm\s\per(\N\kg)
it printsmms/(Nkg)
, without any whitespaces between units. So it seems that whenever parentheses are present (even just one), it falls back to simpler logic. I think it'd be beneficial to mirror that behavior such that users aren't surprised that their results looks great in the console but weird in the LaTeX document. Note that this second bullet point here therefore defeats the first one.
Overview how siunitx does it
\mm\s \per(\N\kg)
-> `mms/(Nkg)\mm\s \per\N\kg
->mm s kg/N
\mm\s (\per\N\kg
->mms(/Nkg
\mm\s (^2\per\N\kg
->mms(^2/Nkg
(the^2
in the output is actually printed as a power)((\mm)\s))))))))))))/\N\k
->((mm)s))))))))))))/Nkg
\mm\N\per (\kg^2)
->mmN/(kg^2)
(the^2
in the output is actually printed as a power)\mm\N\per \kg^2
->mmN/kg^2
(the^2
in the output is actually printed as a power)\mm\N\per \kg\squared
->mm N/kg^2
(the^2
in the output is actually printed as a power)
Note the correct space betweenmm
andN
here, which is not present in the case where we use^2
instead of\squared
in the input!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not worry too much about this. In the end, we are not responsible for what siunitx does. While it would be nice to mirror the behavior of siunitx in our display of units in the console, I don't think it is key to do so. Let's not overcomplicate our code here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that a proper lexer/parser is currently a bit out of scope for ResultWizard
as its main purpose is to export variables to LaTeX. Still, it might be useful for the future, so I linked this comment in a new issue #60.
Furthermore, despite that we probably don't want to implement a full grammar right now, I think we shouldn't turn a blind eye on some things that just don't work right now, e.g. (as described in my comment above), adding a (
after \per
will result in per(
being recognized as one token and therefore the user sees the string per
in the output, which is ugly. I therefore added another small pass in 9be9b27 to fix this.
In the end, we might get ourselves in the situation where we add lots of these "small passes", which gets very complicated to manage very quickly as one wants to tailor many specific use cases. If we notice this, we should refer to #60 and rewrite the logic altogether by means of a proper grammar and lexer/parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(comment removed, see below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging this by means of squash and merge, please make sure you review the commits I've added recently, see the commit page.
Fun fact: this is PR "42" 😅
Fixes #32
Fixes #30
Fixes #40