Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip trailing space from lines in address blocks #371

Merged
merged 5 commits into from
Dec 18, 2024

Conversation

mike3985
Copy link
Contributor

There's a feature in the Kramdown parser (i.e. the library that the
Govspeak parser is built on top of) where two or more trailing
spaces at the end of a line of Govspeak will be converted into an
additional line break in the output HTML.

Until recently, the contents of address blocks weren't "reparsed" in the
Govspeak parser, meaning that they weren't subject to this Kramdown
feature. But the Govspeak parser code was recently updated and the
contents of address blocks are now reparsed. This has resulted in
unintentional additional line spacing being made visible in some
addresses that have been rendered by this new version Govspeak.

A typical example of the issue looks like:

| Name
|
| Street
|
| Town
|
| Postcode
|

where it would've previously looked like:

| Name
| Street
| Town
| Postcode

By stripping the trailing space from Govspeak lines within address
blocks, we effectively disable the Kramdown feature and prevent
unintentional line breaks in addresses going forward.

BTW, it's still straightforward to introduce additional line breaks
deliberately.

I know heredocs are weird, but this is *probably* more readable.

I always need to remind myself how the various flavours of heredoc
actually behave. Squiggly heredocs (<<~) remove indentation from their
content. It'll remove the same amount of indentation across all lines,
equal to the indentation of its least-indented line. In this case, all
of our lines start out indented equally, so all indentation ends up
removed from the resulting string.

Also worth noting: the replacement of newline chars in the old string
with actual line breaks in the heredoc. The tests pass, so I hope that
that means the output is identical in all cases that we care about.
@mike3985
Copy link
Contributor Author

BTW, to test on Govspeak Preview locally, checkout the branch and have Govspeak Preview use the local version of the gem

-gem "govspeak"
+gem "govspeak", path: "../govspeak"

lib/govspeak.rb Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
lib/govspeak.rb Outdated Show resolved Hide resolved
lib/govspeak.rb Outdated Show resolved Hide resolved
@mike3985 mike3985 force-pushed the ignore-trailing-space-in-address-blocks branch 3 times, most recently from 41773e4 to 49509da Compare December 17, 2024 18:03
@mike3985 mike3985 requested a review from yndajas December 17, 2024 18:06
lib/govspeak.rb Outdated Show resolved Hide resolved
lib/govspeak.rb Show resolved Hide resolved
Unfortunately this commit introduces two changes because I haven't
figured out how to make them independently of each other:

* Strip trailing backslashes from the lines of an address
* When replacing line breaks with "<br>"s in the parsing of address blocks,
  replace the whole "\r\n" line break and not just the "\n" character

_Strip trailing backslashes from the lines of an address_

Kramdown has a feature where it will replace two trailing backslashes
from the input Govspeak with a line break in the output HTML. I'm in the
process of making some changes that will stop that feature from working
inside address blocks.

It doesn't appear to be used in the wild in any address blocks and we
don't officially support the feature in address blocks, so I think that
intentionally disabling it is reasonable.

To be clear, while Kramdown only treats pairs of backslashes this way,
we're actually removing all trailing backslashes because even an odd
number of them interferes with our code that handles address blocks.

_When replacing line breaks with "<br>"s in the parsing of address blocks,
replace the whole "\r\n" line break and not just the "\n" character_

The line breaks in the Govspeak in Whitehall's database and in the
strings that Govspeak Preview hands to the parser are composed of both
a return char and a newline char. In the resulting HTML, address blocks
contain the return chars still intact because the address block's
search-and-replace code was only matching and removing the newline
chars.

The intact return chars might actually be serving the purpose of
producing line breaks in our source HTML (i.e. not affecting the way the
HTML is rendered in the browser, but admittedly producing more readable
plain HTML). But if this is the case and if this turns out to have been
desirable behaviour I'd prefer that we do that more deliberately.

Since the example Govspeak in the test suite sometimes uses standalone
newlines and sometimes uses combined returns-and-newlines, it seems
safest to assume that both options are possible and so I've opted for
optionally matching either.
This `sub` was present in the very first commit that introduced the
address block extension and there's no explanation of what it's
specifically intended to achieve.

I've interpreted it as "if there's a leading line break, remove it", as
opposed to "remove the first occurrence of a line break regardless of
where it appears in the input string". I could have missed something.

This change makes the intention that I've assumed was there clearer to
readers and prevents any other line break being removed unintentionally
if something did ever change about the expected format of the input.
There's a feature in the Kramdown parser (i.e. the library that the
Govspeak parser is built on top of) where two or more trailing
spaces at the end of a line of Govspeak will be converted into an
additional line break in the output HTML.

Until recently, the contents of address blocks weren't "reparsed" in the
Govspeak parser, meaning that they weren't subject to this Kramdown
feature. But the Govspeak parser code was recently updated and the
contents of address blocks are now reparsed. This has resulted in
unintentional additional line spacing being made visible in some
addresses that have been rendered by this new version Govspeak.

A typical example of the issue looks like:

| Name
|
| Street
|
| Town
|
| Postcode
|

where it would've previously looked like:

| Name
| Street
| Town
| Postcode

By stripping the trailing space from Govspeak lines within address
blocks, we effectively disable the Kramdown feature and prevent
unintentional line breaks in addresses going forward.

BTW, it's still straightforward to introduce additional line breaks
deliberately.

_A note on the move to handling the last line of the address separately
from the rest:_

In Govspeak, address blocks don't necessarily have a line break between
the last line of the address and its closing tag. Until now, where the
input Govspeak did have a line break, the output HTML replaced it with
a <br>. When there wasn't a line break, the output had no <br>. This
difference didn't seem to me to have any impact on the final rendered
page in the browser, so I've gone ahead and stopped those <br>s from
appearing (as can be seen in the changes we've made here to existing
tests).

So now, while trailing spaces and backslashes are being stripped from
all address lines, in addition to that, for the very last line of the
address, trailing line break characters are also being stripped.
Mostly stripping trailing whitespace from lines within an address block,
to prevent Kramdown from converting them into unintended additional line
breaks.
@mike3985 mike3985 force-pushed the ignore-trailing-space-in-address-blocks branch from 49509da to 9c17377 Compare December 18, 2024 10:50
@mike3985 mike3985 merged commit a45f500 into main Dec 18, 2024
9 checks passed
@mike3985 mike3985 deleted the ignore-trailing-space-in-address-blocks branch December 18, 2024 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants