Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] html_unescape: Create html.unescape extension and use it for no-break space #1095

Merged
merged 2 commits into from
Nov 1, 2023

Conversation

JJRcop
Copy link
Collaborator

@JJRcop JJRcop commented Oct 30, 2023

Fixes no-break space by adding a new html_unescape extension to the docs, which runs Python's html.unescape on source files before Sphinx renders them.

This lets us use HTML references like   in the docs, which get turned into the real characters as Sphinx is reading the files to render them. The source files are not affected, this only happens when rendering.

I have also published this extension in my own name under a different license (the same one Sphinx uses) for others to use:
https://github.com/JJRcop/sphinxcontrib-html_unescape

@gnif
Copy link
Owner

gnif commented Oct 30, 2023

Is there a better way to do this, such as  ?

@JJRcop
Copy link
Collaborator Author

JJRcop commented Oct 30, 2023

Is there a better way to do this, such as  ?

The docutils (sphinx sits on top of this) FAQ page seems to recommend using the literal character rather than escaping it.

How can I represent esoteric characters (e.g. character entities) in a document?

For example, say you want an em-dash (XML character entity —, Unicode character U+2014) in your document: use a real em-dash. Insert literal characters (e.g. type a real em-dash) into your input file, using whatever encoding suits your application, and tell Docutils the input encoding. Docutils uses Unicode internally, so the em-dash character is U+2014 internally.
[…]
ReStructuredText has no character entity subsystem; it doesn't know anything about XML character entities. To Docutils, "—" in input text is 7 discrete characters; no interpretation happens. When writing HTML, the "&" is converted to "&", so in the raw output you'd see "—". There's no difference in interpretation for text inside or outside inline literals or literal blocks -- there's no character entity interpretation in either case.

It continues talking about a workaround using |substitution|, but rST doesn't support nested inline markup which would be needed for that to work in this case (since it's under the "literal" markup of `` already)

Is nested inline markup possible?

Not currently, no. It's on the to-do list (details here), and hopefully will be part of the reStructuredText parser soon.
[...]
There are workarounds, but they are either convoluted or ugly or both. They are not recommended.

I was doing further research and found we could run html.unescape() on each file contents from the python standard library, which would enable terms like  . Probably best to do that as a sphinx extension

In 3625207 I attempted to add
non-breaking spaces to a filepath so it would stay on one line.
Before this I had accidentally deleted my work but found it saved
in my sphinx build cache, so I copied my changes from that cache.

Unfortunately the cached version replaced non-breaking spaces with
real spaces and 3625207 was made
reverted.

This commit re-adds the non-breaking spaces.
@JJRcop JJRcop marked this pull request as draft October 30, 2023 19:32
This new sphinx extension runs html.unescape
(from the Python Standard Library) on source files before they are
rendered, allowing escape sequences like  ' ' for the no-break
space character.

I have also published this extension in my own name under a different
license (the same one Sphinx uses) for others to use:
https://github.com/JJRcop/sphinxcontrib-html_unescape
@JJRcop JJRcop changed the title [doc] usage: Actually add non-breaking spaces to config file [doc] html_unescape: Create html.unescape extension and use it for no-break space Oct 30, 2023
@JJRcop JJRcop marked this pull request as ready for review October 30, 2023 23:52
@JJRcop
Copy link
Collaborator Author

JJRcop commented Oct 31, 2023

I was doing further research and found we could run html.unescape() on each file contents from the python standard library, which would enable terms like  . Probably best to do that as a sphinx extension

Done

@gnif gnif merged commit f6b2cec into gnif:master Nov 1, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants