[doc] html_unescape: Create html.unescape extension and use it for no-break space #1095

JJRcop · 2023-10-30T03:39:42Z

Fixes no-break space by adding a new html_unescape extension to the docs, which runs Python's html.unescape on source files before Sphinx renders them.

This lets us use HTML references like   in the docs, which get turned into the real characters as Sphinx is reading the files to render them. The source files are not affected, this only happens when rendering.

I have also published this extension in my own name under a different license (the same one Sphinx uses) for others to use:
https://github.com/JJRcop/sphinxcontrib-html_unescape

gnif · 2023-10-30T06:00:50Z

Is there a better way to do this, such as  ?

JJRcop · 2023-10-30T18:48:20Z

Is there a better way to do this, such as  ?

The docutils (sphinx sits on top of this) FAQ page seems to recommend using the literal character rather than escaping it.

How can I represent esoteric characters (e.g. character entities) in a document?

For example, say you want an em-dash (XML character entity —, Unicode character U+2014) in your document: use a real em-dash. Insert literal characters (e.g. type a real em-dash) into your input file, using whatever encoding suits your application, and tell Docutils the input encoding. Docutils uses Unicode internally, so the em-dash character is U+2014 internally.
[…]
ReStructuredText has no character entity subsystem; it doesn't know anything about XML character entities. To Docutils, "—" in input text is 7 discrete characters; no interpretation happens. When writing HTML, the "&" is converted to "&", so in the raw output you'd see "—". There's no difference in interpretation for text inside or outside inline literals or literal blocks -- there's no character entity interpretation in either case.

It continues talking about a workaround using |substitution|, but rST doesn't support nested inline markup which would be needed for that to work in this case (since it's under the "literal" markup of `` already)

Is nested inline markup possible?

Not currently, no. It's on the to-do list (details here), and hopefully will be part of the reStructuredText parser soon.
[...]
There are workarounds, but they are either convoluted or ugly or both. They are not recommended.

I was doing further research and found we could run html.unescape() on each file contents from the python standard library, which would enable terms like  . Probably best to do that as a sphinx extension

In 3625207 I attempted to add non-breaking spaces to a filepath so it would stay on one line. Before this I had accidentally deleted my work but found it saved in my sphinx build cache, so I copied my changes from that cache. Unfortunately the cached version replaced non-breaking spaces with real spaces and 3625207 was made reverted. This commit re-adds the non-breaking spaces.

This new sphinx extension runs html.unescape (from the Python Standard Library) on source files before they are rendered, allowing escape sequences like ' ' for the no-break space character. I have also published this extension in my own name under a different license (the same one Sphinx uses) for others to use: https://github.com/JJRcop/sphinxcontrib-html_unescape

JJRcop · 2023-10-31T00:14:02Z

I was doing further research and found we could run html.unescape() on each file contents from the python standard library, which would enable terms like . Probably best to do that as a sphinx extension

Done

JJRcop marked this pull request as draft October 30, 2023 19:32

JJRcop force-pushed the case-of-the-missing-nbsp branch from 7887085 to f25771a Compare October 30, 2023 23:47

JJRcop changed the title ~~[doc] usage: Actually add non-breaking spaces to config file~~ [doc] html_unescape: Create html.unescape extension and use it for no-break space Oct 30, 2023

JJRcop marked this pull request as ready for review October 30, 2023 23:52

gnif merged commit f6b2cec into gnif:master Nov 1, 2023
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc] html_unescape: Create html.unescape extension and use it for no-break space #1095

[doc] html_unescape: Create html.unescape extension and use it for no-break space #1095

JJRcop commented Oct 30, 2023 •

edited

Loading

gnif commented Oct 30, 2023

JJRcop commented Oct 30, 2023

JJRcop commented Oct 31, 2023

[doc] html_unescape: Create html.unescape extension and use it for no-break space #1095

[doc] html_unescape: Create html.unescape extension and use it for no-break space #1095

Conversation

JJRcop commented Oct 30, 2023 • edited Loading

gnif commented Oct 30, 2023

JJRcop commented Oct 30, 2023

How can I represent esoteric characters (e.g. character entities) in a document?

Is nested inline markup possible?

JJRcop commented Oct 31, 2023

JJRcop commented Oct 30, 2023 •

edited

Loading