-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip trailing space from lines in address blocks #371
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I know heredocs are weird, but this is *probably* more readable. I always need to remind myself how the various flavours of heredoc actually behave. Squiggly heredocs (<<~) remove indentation from their content. It'll remove the same amount of indentation across all lines, equal to the indentation of its least-indented line. In this case, all of our lines start out indented equally, so all indentation ends up removed from the resulting string. Also worth noting: the replacement of newline chars in the old string with actual line breaks in the heredoc. The tests pass, so I hope that that means the output is identical in all cases that we care about.
BTW, to test on Govspeak Preview locally, checkout the branch and have Govspeak Preview use the local version of the gem -gem "govspeak"
+gem "govspeak", path: "../govspeak" |
yndajas
approved these changes
Dec 17, 2024
mike3985
force-pushed
the
ignore-trailing-space-in-address-blocks
branch
3 times, most recently
from
December 17, 2024 18:03
41773e4
to
49509da
Compare
yndajas
approved these changes
Dec 17, 2024
Unfortunately this commit introduces two changes because I haven't figured out how to make them independently of each other: * Strip trailing backslashes from the lines of an address * When replacing line breaks with "<br>"s in the parsing of address blocks, replace the whole "\r\n" line break and not just the "\n" character _Strip trailing backslashes from the lines of an address_ Kramdown has a feature where it will replace two trailing backslashes from the input Govspeak with a line break in the output HTML. I'm in the process of making some changes that will stop that feature from working inside address blocks. It doesn't appear to be used in the wild in any address blocks and we don't officially support the feature in address blocks, so I think that intentionally disabling it is reasonable. To be clear, while Kramdown only treats pairs of backslashes this way, we're actually removing all trailing backslashes because even an odd number of them interferes with our code that handles address blocks. _When replacing line breaks with "<br>"s in the parsing of address blocks, replace the whole "\r\n" line break and not just the "\n" character_ The line breaks in the Govspeak in Whitehall's database and in the strings that Govspeak Preview hands to the parser are composed of both a return char and a newline char. In the resulting HTML, address blocks contain the return chars still intact because the address block's search-and-replace code was only matching and removing the newline chars. The intact return chars might actually be serving the purpose of producing line breaks in our source HTML (i.e. not affecting the way the HTML is rendered in the browser, but admittedly producing more readable plain HTML). But if this is the case and if this turns out to have been desirable behaviour I'd prefer that we do that more deliberately. Since the example Govspeak in the test suite sometimes uses standalone newlines and sometimes uses combined returns-and-newlines, it seems safest to assume that both options are possible and so I've opted for optionally matching either.
This `sub` was present in the very first commit that introduced the address block extension and there's no explanation of what it's specifically intended to achieve. I've interpreted it as "if there's a leading line break, remove it", as opposed to "remove the first occurrence of a line break regardless of where it appears in the input string". I could have missed something. This change makes the intention that I've assumed was there clearer to readers and prevents any other line break being removed unintentionally if something did ever change about the expected format of the input.
There's a feature in the Kramdown parser (i.e. the library that the Govspeak parser is built on top of) where two or more trailing spaces at the end of a line of Govspeak will be converted into an additional line break in the output HTML. Until recently, the contents of address blocks weren't "reparsed" in the Govspeak parser, meaning that they weren't subject to this Kramdown feature. But the Govspeak parser code was recently updated and the contents of address blocks are now reparsed. This has resulted in unintentional additional line spacing being made visible in some addresses that have been rendered by this new version Govspeak. A typical example of the issue looks like: | Name | | Street | | Town | | Postcode | where it would've previously looked like: | Name | Street | Town | Postcode By stripping the trailing space from Govspeak lines within address blocks, we effectively disable the Kramdown feature and prevent unintentional line breaks in addresses going forward. BTW, it's still straightforward to introduce additional line breaks deliberately. _A note on the move to handling the last line of the address separately from the rest:_ In Govspeak, address blocks don't necessarily have a line break between the last line of the address and its closing tag. Until now, where the input Govspeak did have a line break, the output HTML replaced it with a <br>. When there wasn't a line break, the output had no <br>. This difference didn't seem to me to have any impact on the final rendered page in the browser, so I've gone ahead and stopped those <br>s from appearing (as can be seen in the changes we've made here to existing tests). So now, while trailing spaces and backslashes are being stripped from all address lines, in addition to that, for the very last line of the address, trailing line break characters are also being stripped.
Mostly stripping trailing whitespace from lines within an address block, to prevent Kramdown from converting them into unintended additional line breaks.
mike3985
force-pushed
the
ignore-trailing-space-in-address-blocks
branch
from
December 18, 2024 10:50
49509da
to
9c17377
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There's a feature in the Kramdown parser (i.e. the library that the
Govspeak parser is built on top of) where two or more trailing
spaces at the end of a line of Govspeak will be converted into an
additional line break in the output HTML.
Until recently, the contents of address blocks weren't "reparsed" in the
Govspeak parser, meaning that they weren't subject to this Kramdown
feature. But the Govspeak parser code was recently updated and the
contents of address blocks are now reparsed. This has resulted in
unintentional additional line spacing being made visible in some
addresses that have been rendered by this new version Govspeak.
A typical example of the issue looks like:
where it would've previously looked like:
By stripping the trailing space from Govspeak lines within address
blocks, we effectively disable the Kramdown feature and prevent
unintentional line breaks in addresses going forward.
BTW, it's still straightforward to introduce additional line breaks
deliberately.