[BUG] Markup escaping not working as expected #878

ronf · 2021-01-05T03:23:29Z

Describe the bug
It's a bit of an edge case, but right now it's not possible to output a backslash immediately before a square bracket starting a markup token. While it does properly output a single backslash if you feed in r'\\' followed by a markup token, it still disables interpreting the token as markup.

To Reproduce
Working cases:

rich.print(r'[bold]text[/]') prints "text"
rich.print(r'\[not a tag]') prints "[not a tag]"

Not working case:
rich.print(r'\\[bold]some text[/]') gives an error about an unexpected close tag, because it doesn't treat "[bold]" as markup. The expected result here was "\text", a single backslash followed by the text in bold.

Adding more backslashes also produces a surprising result. It seems like the rightmost backslash is always consumed to disable markup on the square bracket, but all the previous backslashes are passed through as-is, rather than having every other backslash escape the following one. So, with three backslashes, I was expecting one to be output (consuming the first two of the input) and one to be left over to escape the square bracket and disable markup. WIth four backslashes, I was expecting two backslashes to be output (consuming all four of the input) and the square bracket after that to be treated as markup.

Platform
Rich 9.5.1 on macOS Mojave (10.14.6) with Apple's Terminal.app.

The text was updated successfully, but these errors were encountered:

ronf · 2021-01-05T04:06:21Z

Here's a possible fix:

--- markup.py.orig	2020-12-30 06:35:31.000000000 -0800
+++ markup.py	2021-01-04 20:00:50.000000000 -0800
@@ -9,7 +9,7 @@
 
 RE_TAGS = re.compile(
     r"""
-(\\\[)|
+(\\[[\\])|
 \[([a-z#\/].*?)\]
 """,
     re.VERBOSE,
@@ -60,12 +60,12 @@
     """
     position = 0
     for match in RE_TAGS.finditer(markup):
-        (escape_open, tag_text) = match.groups()
+        (escaped, tag_text) = match.groups()
         start, end = match.span()
         if start > position:
             yield start, markup[position:start], None
-        if escape_open:
-            yield start, "[", None
+        if escaped:
+            yield start, escaped[1], None
         else:
             text, equals, parameters = tag_text.partition("=")
             if equals:

Basically, it looks for either \\ or \[ in the first part of RE_TAGS, and then uses that to consume pairs of backslashes before looking for backslash and the bracket. Later, when outputting things, it outputs either a single backslash or the square bracket when that first part of RE_TAGS matches. Otherwise, it does what it did before.

ronf · 2021-01-05T16:23:40Z

It occurred to me this morning that the escape() function should also be updated. Here's a diff for that:

@@ -48,7 +48,7 @@
     Returns:
         str: Markup with square brackets escaped.
     """
-    return markup.replace("[", r"\[")
+    return markup.replace("\\", r"\\").replace("[", r"\[")

Note that the order of the replace calls is important here.

I also wonder whether or not backslash escaping should be allowed for ":" to prevent it from doing emoji conversion (with either additional changes to escape() here, or a separate escape function for that).

Speaking of separating markup & emoji replacement, it doesn't look like there's a way to do ONLY emoji replacement on a string right now. There are some places where I'd like to be able to do that, but the replace function is in a private module right now. The opposite is supported by calling Text.from_markup() with emoji=False, but there's no way to ask for emoji replacement to be enabled while disabling other markup.

willmcgugan · 2021-01-05T16:27:52Z

Thanks for the report and suggested fix. I'll take a look at those soon.

re Emoji you can do this:

from rich.emoji import Emoji
print(Emoji.replace(":100:"))

ronf · 2021-01-05T16:31:22Z

Ah, I missed that. I'll give it a try - thanks!

willmcgugan · 2021-01-05T21:30:37Z

Potential fix in this branch.

Decided against escaping emoji, it complicates things too much. Emoji can be disabled anyway.

ronf · 2021-01-06T02:23:05Z

Unfortunately, this fix doesn't cover all the cases. It handles an escaped bracket and a single escaped backslash, but if you put more than two backslashes, it always matches on the rightmost two of them and leaves the square bracket as always being a markup token. With three backslashes, the first two should turn into a single backslash in the output, but the third backslash should escape the bracket and make it a literal. This continues in even/odd pairs for an arbitrary number of backslashes, and my fix took that into account by letting each pair (of either \\ or \[) be matched in a separate match in the finditer.

willmcgugan · 2021-01-06T21:35:50Z

Try the branch now...

Note that the escaping is only required for backslashes directly preceding the [. People printing Windows paths with backslashes would hate me otherwise.

ronf · 2021-01-07T03:30:47Z

Looks good - thanks!

Preserving backslashes as-is elsewhere in the string makes sense to me. I do something similar in some code I wrote which allows users to input raw unicode characters by entering something like \u203c or \U0001f348 inside their input string, converting those sequences to a single Unicode character when the entered code point is a valid printable character. Entering \\u203c is turned into \u203c (removing one backslash), but entering \\ not followed by a "u" or "U" and the right number of hex digits is left alone.

willmcgugan · 2021-01-07T21:57:54Z

Fixed in v9.6.2

ronf · 2021-01-08T04:13:36Z

Upgraded here - looks great. Thanks so much!

ronf added the Needs triage label Jan 5, 2021

willmcgugan added accepted Task was accepted bug Something isn't working and removed Needs triage labels Jan 5, 2021

willmcgugan closed this as completed Jan 7, 2021

This was referenced Jun 6, 2021

Update dependency rich to v9.13.0 brainbotdev/brainbot#108

Open

chore(deps): update dependency rich to v10 - autoclosed brainbotdev/brainbot#117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Markup escaping not working as expected #878

[BUG] Markup escaping not working as expected #878

ronf commented Jan 5, 2021

ronf commented Jan 5, 2021

ronf commented Jan 5, 2021

willmcgugan commented Jan 5, 2021

ronf commented Jan 5, 2021

willmcgugan commented Jan 5, 2021

ronf commented Jan 6, 2021

willmcgugan commented Jan 6, 2021

ronf commented Jan 7, 2021 •

edited

Loading

willmcgugan commented Jan 7, 2021

ronf commented Jan 8, 2021

[BUG] Markup escaping not working as expected #878

[BUG] Markup escaping not working as expected #878

Comments

ronf commented Jan 5, 2021

ronf commented Jan 5, 2021

ronf commented Jan 5, 2021

willmcgugan commented Jan 5, 2021

ronf commented Jan 5, 2021

willmcgugan commented Jan 5, 2021

ronf commented Jan 6, 2021

willmcgugan commented Jan 6, 2021

ronf commented Jan 7, 2021 • edited Loading

willmcgugan commented Jan 7, 2021

ronf commented Jan 8, 2021

ronf commented Jan 7, 2021 •

edited

Loading