-
-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding non-breakable spaces lua filter #119
base: master
Are you sure you want to change the base?
Conversation
OK, it seems that somehow I did not upload the I have added |
On GitHub a Pull Request is a request to merge one branch into another, in this case your fork's master branch into this repository's. You can update PR's by adding, removing, or rebasing commits in a branch. Typically (and to get more GitHub tooling) you'd open a PR from a branch on your fork other than |
Oh, I wasnt sure if I should create new branch in my fork or not. Next time I do better. Do I understand you correctly, that now nobody (except me) can make edit to this PR? I thought, that the "Allow edits by maintainers" checker is to control this. I have it checked, which I hoped is for making it possible for project maintainers to correct any errors in this PR (well, if they prefer that over telling me what should I fix). |
Really? Last I checked (less than a week ago) that checkbox was disabled for all PR's from master branches. If you see the box and have it checked then GitHub has updated something recently, and that's a nice fix that I'm glad to hear about. |
I have used this manual: Step no. 7 is the one which we are talking about, I hope. Except I did the exact opposite: Left it checked to allow maintainers to do with this PR anything they deem neccessary ... Btw. It seems that bibliography test are misbehaving; maybe because of newest update? |
Interesting. Today, I was looking at some legacy production code with the exact same purpose; thinking by myself that I better replace it by a Lua filter. The current code is a monster of a nested The The list of short begin words is language dependent. Here are the English and Dutch lists. Finally, we do not want such substitutions to occur in |
@stroobandt Thank you for posting that! Sadly, I am not enough versed in If you are concerned about code-block (I was too) I have tested that (and it is even in test file of the PR) - codeblocks and inline code is recognized well and no space substitution happens there. |
Thanks for the updates! I'll take a closer look soon, I just need to fix the tests first. This may take a while, as I'm going to spend less time in front of the screen for the next couple of days. The filter name "nonbreakingspace" is a bit generic and somewhat cryptic. Can we find something more specific? How about using the same name as the TeX package, "vlna"? Would that be too confusing? Maybe "west-slavic-nbsp"? I'm open for ideas. |
Well, "pandoc-vlna" in itself could be good, but I would think that it would introduce different "crypticity;" that is, that LaTeX packages will be known only to LaTeX users; and most probably to the subset that doesnt already used theyre own homemade solutions (I would guess that those with homemade solutions will be actually plenty). I thought that naming the filter by what it does as much as possible will prevent its name from being misleading. Very often I am taking inspiration from lua filter But in the spirit of your previous comment, also "insert-nbsp" (or no-abbreviated form "insert-nonbreakable-space") could work too. I would like to aviod specifying any nationalities or geografic locations to the filter, because it might be offputting to someone (or even any way worse). The code state is mostly caused by me being lua newbie and natively Czech, which leads me to make the default setting suitable for Czech language requirements. I hope that the filter as it is is customizable enough for anybody to make changes according to his native language requirements. If anybody would know how to make this filter conformant to more than one language, I would be happy to make more improvements. In this regard, my only idea is to have multiple (well, depending on how many languages a lot) tables for each language and then (somehow) read metadata @tarleb Let me know which identification is more suitable:
At the end of the day, as famous classic says:
... as long as its well described in README.md (hopefully it is). Regards, Tomas |
I'd like to ask a favor. I compiled a new contributing guide over the last two weeks and would like to get some feedback on it. I'd appreciate if you could take a look at the document and tell me about all passages that are unclear or that you feel should be improved for other reasons. I am not commenting on the code yet, as I'm also trying to evaluate if the information I added to the guide is sufficient to prevent some common issues which often come up during code review. Hope that's ok with you? As for the name: each options seems fine. Since Czech is the only supported language, we could also be more explicit about the filter's purpose and include czeck in the name. I'll leave the choice to you. |
Considering the new guide, everything mentioned seems pretty completely clear and OK to me. I can accomodate for the ammount of characters per line. Editorconfig seems like a great suggestion, although I am not familiar with that utility. |
Updated to have max 80 chars per line.
Renamed to `sampleCZ` to allow testing of czech language setting.
Another sample for english language testing.
Expected result from compilation of `sampleCZ.md`. Just renamed.
Updated to read `lang` variable setting with default value for english, also with fallback to english if `lang` is not specified. Also added english `prefixes`.
New file for testing after latest filter update.
Added explicit `lang` setting.
Added test result for english `lang` setting.
Added test for english `lang` setting, renamed filter file.
@tarleb I have finished changes according to the newest contributing guide. All should be good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes, I like where this is heading! I left some inline comments which I hope are helpful. Ready to merge once these are resolved.
nonbreakablespace/pandocVlna.lua
Outdated
local insert = insert_nonbreakable_space(FORMAT) | ||
|
||
for i = 1, #inlines do | ||
if inlines[i].t == 'Space' then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming the elements which we are looking at? I find prev
or previous
much easier to read than inlines[i - 1]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can accomodate for that, eventhough in this case for me personally was easier to see exactly what is it I am calling at.
I have made required changes, but I am having trouble assigning the replacement string wtih them, meaning if I write:
currentElement = insert
it doesnt work.
But this works:
inlines[i] = insert
works. It makes sense that these make-up variables dont reassing to original inlines list, but how can I accomodate for that?
I have also noticed another issue - writing SoftBreak
element instead of Space
in the place, where should be  
wont trigger replacement (of course) - I have remedied for that: Now I am testing for Space
of SoftBreak
elements.
I have uploaded new files.
Please, let me know what you think about it. I personally in this specific case would prefer writing inlines[i]
; but modifying the filter as you suggest would allow me to learn more, so I am open to doing that. After resolving this issue, I will work on the next one.
Also added fixes per first review suggestion (at least partially)
Complete reupload
Fixed review suggestion no. 2
about to reupload new one
Changes per review suggestion 3 and 4.
Also updated file header according to new contribution guide.
Added another update with all improvements suggested in review. Also updated filter file header according to new contribution guide. |
The filter failed if last element in paragraph would be "Space" (result of other filter or from writing), similarly to case when "Space" would be first element of par block. Fixed now in replacement loop beginning (line 122)
Many thanks for this filter, @Delanii! I wonder if it is possible to add the |
Just a side note: in lua regexp, % is the escape char, so you have to type %% so
that it escapes itself.
I wonder if this filter could be made more general by creating some metadata
fields taking a list a characters to be associated with a non-breakable space.
Something like "non-breakable-space-before", "non-breakable-space-after",
"thin-space-before", "thin-space-after". Of course it would be complementary
to the built-in configuration files per language that you mentioned. What are
your thoughts about that?
Le Thursday 06 May 2021 à 03:31:12AM, Jan Netík a écrit :
… Many thanks for this filter, @Delanii! I wonder if it is possible to add the %
character to the list. In Czech, we write non-breakable space between the
number and % sign (when percentages are used as an adjective, there is no
space). I've tried to change the filter, but failed miserably. Maybe % needs to
be escaped as \%, but that fails too (as well as \\% or \\\% does). Help would
be greatly appreciated!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.*
|
Hello @netique , well, this filter doesnt cover it all. I admit I have settled up to dealing with things that were the easiest -- just list few prepositions and deal with split numbers. The percent character could be thought about as a unit, which has actually the opposite requirement -- has to be coupled with preceeding text element (number, as in "10 %"), not with the following one (word, as in "10 apples"). As I wrote in the beginning, I was and still am a lua newbie, so by cutting off units, chapters (ex: "chapter 9") and such, I have made creating this filter a lot easier. I am not very sure how I would deal with units and such in this filter. In LaTeX this is being dealt with by means of famous |
Hi @badumont ! Well, that is a great idea, I was actually thinking about something like that in the beginning, but then I thought/hoped that this filter could grow by feedback to incorporate more languages with their respective rules. I wanted the filter to be as simple to use as possible so there is no configuration. But the code is open as well, so you can modify it to your respective needs (by just modifying the table with prepositions) to tweak the behaviour as you need. I also have no idea how to implement configuration of a lua filter. And how would that be stated (configuration file, default file, stating at command line??) Do you have an idea how to do that? |
I am not an expert either, but here is my guess. If you wish to support multiple languages, I think that the filter itself should For the configuration, there could be two non-exclusive ways: the one that I local nbsp_before = {
':',
'»'
}
local nbsp_after = {
'«'
}
local thin_before = {
';'
}
spaces_config = {
nbsp_before = nbsp_before,
nbsp_after = nbsp_after,
thin_before = thin_before
} Then simply add Another thing: why do you define the nbsp character per format? Pandoc does the I can help if you want. As a French user who currently selects manually every special space, I am very interested in this filter! However, in that case, it may be easier to do it on a separate repo for I already have a fork of lua-filters with a pending pull request... |
Hi both What about specifying functions depending on the language set? For instance if Edit: Okay I just saw how this is implemented. |
This adresses #114 .
Its first pull request I have ever made, so hopefully I did not messed anything up. I know it is a whole bunch of files, which should be avoided, but I think that in this case it should be put-and-packaged as one ...
Per #114 it still has not implemented the change in tables and presence-check, since I am having trouble understanding that syntax (idiom) and using it. Hopefully it doesnt mind too much.
Regards, Tomas