-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to indent nest directives #493
Comments
I guess there is two "competing" aspects here: the "Markdown way" for users (ie.g. writability, readability), and the "Mardown way" for parsers (e.g. the commonmark spec) For sure, if you start using the RST directive like syntax with a standard Markdown parser/renderer, you are going to get a mess. I'd also note, that with the original Markdown parsers myst-parser was using, it probably would have been impossible, since syntax extensions were often based on nasty regex matching etc.
This is kinda the pro/con as well: for a lot of use cases you don't need the nesting, and then IMO it becomes a pain: ```python
from a import b
def func(x):
return b(x)
``` to this .. code-block:: python
from a import b
def func(x):
return b(x) and it is not known syntax to Markdown users. As a slight aside, you can also do some interesting block syntax nesting > ```python
> print("Hello world")
> ```
- ```python
print("Hello world")
``` I guess perhaps you might even want some kind of hybrid : use backticks for the first directive level, with no indentation, then some kind of indentation syntax for nested directives ```{note}
.. admonition::
Some text
.. warning::
Some other text
``` don't know; I'm not promising anything 😅, but open to suggestions |
If anyone is game, they can always try writing a markdown-it-py plugin (see also https://github.com/executablebooks/mdit-py-plugins). |
While I'm at, one extra "annoyance" from a parsing perspective for RST directives, is that you cannot a priori "know" the structure of the syntax, without first dynamically retrieving the directive and looking at its spec. see https://github.com/live-clones/docutils/blob/cc65c243ac3d871671920d939d48f5734d964bb3/docutils/docutils/parsers/rst/__init__.py#L208 and https://github.com/live-clones/docutils/blob/cc65c243ac3d871671920d939d48f5734d964bb3/docutils/docutils/parsers/rst/states.py#L2166 For example, with: class MyDirective(Directive):
has_content = True
then "some text" will be parsed as the body text But if you were to use class MyDirective(Directive):
optional_arguments = 1
has_content = True Now "some text" is treated as the argument text. This is "nice" from a user perspective, because you can create terser syntax, i.e. you don't have to do:
But it is horrible from a parser perspective, because you cannot parse the syntax in "isolation", you have to load every directive that you might be using up-front (e.g. including all sphinx extensions) (basically I would want a syntax that does not have this "dynamism"; an argument is an argument, and the body is the body) |
Thanks a lot for your comments, Chris.
So do I.
Not this syntax per se, but indented code blocks are. I, you, and most people these days, do however prefer explicit code fences. I don't want to get rid of them. Code fences are fine the way they are. The only complication they present in this context, is that their syntax may overlap with the indentation-based one. In other words, the latter may need a different syntax marker, something other than triple back-ticks, so that the parser can keep up.
I think that's a serious contender: Parse reST-like directive syntax, but only inside MyST-style Sphinx directives. It would not be the perfect solution, as it certainly comes with a documentation overhead. ("Here, the directive may have reST-like syntax, but content is parsed as Markdown, regardless. Whereas
No need. 👍 And even if you did, no rush.
I can see how that's annoying: there is no direct translation of optional arguments and body content. The parser has to figure that out on a per-directive basis. And indentation may come into play. That's true for the reST parser. What if the MyST parser just enforced the distinction? Like, if "some text" is in-line, pass it as an optional argument, no body content. If the directive does not accept that, let it fail. User has to adjust the source. MyST can be stricter that Sphinx/reST, that doesn't get in the way of feature parity. |
actually, I would note here that you can already use indentation in your directive cells, providing:
For example, ```{note}
~~~{note}
~~~{note}
~~~{important}
Hallo World!
~~~
~~~
~~~
```
~~~{note}
```{warning}
Hey again!
```
~~~ gives you @john-hen do you think that is sufficient? Perhaps we should document this If you wished to remove restriction (2), with markdown-it you would supersede the indented 'code' rule with a modified from markdown_it import MarkdownIt
from markdown_it.rules_block import StateBlock
def _fence(state: StateBlock, startLine: int, endLine: int, silent: bool):
haveEndMarker = False
pos = state.bMarks[startLine] + state.tShift[startLine]
maximum = state.eMarks[startLine]
# COMMENTING OUT MAX 3 CHARS INDENTATION
# if it's indented more than 3 spaces, it should be a code block
# if state.sCount[startLine] - state.blkIndent >= 4:
# return False
if pos + 3 > maximum:
return False
marker = state.srcCharCode[pos]
# /* ~ */ /* ` */
if marker != 0x7E and marker != 0x60:
return False
# scan marker length
mem = pos
pos = state.skipChars(pos, marker)
length = pos - mem
if length < 3:
return False
markup = state.src[mem:pos]
params = state.src[pos:maximum]
# /* ` */
if marker == 0x60:
if chr(marker) in params:
return False
# Since start is found, we can report success here in validation mode
if silent:
return True
# search end of block
nextLine = startLine
while True:
nextLine += 1
if nextLine >= endLine:
# unclosed block should be autoclosed by end of document.
# also block seems to be autoclosed by end of parent
break
pos = mem = state.bMarks[nextLine] + state.tShift[nextLine]
maximum = state.eMarks[nextLine]
if pos < maximum and state.sCount[nextLine] < state.blkIndent:
# non-empty line with negative indent should stop the list:
# - ```
# test
break
if state.srcCharCode[pos] != marker:
continue
if state.sCount[nextLine] - state.blkIndent >= 4:
# closing fence should be indented less than 4 spaces
continue
pos = state.skipChars(pos, marker)
# closing code fence must be at least as long as the opening one
if pos - mem < length:
continue
# make sure tail has spaces only
pos = state.skipSpaces(pos)
if pos < maximum:
continue
haveEndMarker = True
# found!
break
# If a fence has heading spaces, they should be removed from its inner block
length = state.sCount[startLine]
state.line = nextLine + (1 if haveEndMarker else 0)
token = state.push("fence", "code", 0)
token.info = params
token.content = state.getLines(startLine + 1, nextLine, length, True)
token.markup = markup
token.map = [startLine, state.line]
return True
def new_fence(md: MarkdownIt):
# assess before indented code
md.block.ruler.before(
"code", "new_fence", _fence, {"alt": ["paragraph", "reference", "blockquote", "list"]}
)
md = MarkdownIt("commonmark").disable("code").use(new_fence)
tokens = md.parse("""
```{note}
more than 3 spaces
```
"""
)
print(tokens)
|
Perfect! 😃 I did not know about this. Renders this entire issue moot. Tested with the example from the original post, and it works like a charm: ~~~{py:class} Class()
:module: module
Doc-string of the class.
~~~{py:method} Class.method()
:module: module
Doc-string of the method. Yes, it should be mentioned somewhere in the documentation. Doesn't have to be the most prominent place, since the code fences are the more familiar syntax to Markdown users and do cover almost all use cases. This will only be relevant for people actually using those domain directives. I see no problem with the fact that the indentation needs to be three spaces. That's also what Autodoc uses. |
yeh I don't know why I didn't think of it earlier either lol |
gonna re-open, to remember to put this in the docs 👍 |
Description / Summary
I recently came across this (resolved) Sphinx issue: sphinx-doc/sphinx#9165. It was (no need to read or re-read that now) about the tutorial aimed at beginners. The original proposal was to use MyST as the mark-up language, given that newcomers would tend to be more familiar with Markdown rather than reStructuredText. However, that didn't come to pass, the tutorial ended up using reST. Namely because MyST is but a Sphinx extension and not officially supported… yet. I would love for that to change… eventually.
Arguably, for MyST Markdown to ever be on equal footing with Sphinx reST, it would need to reach feature parity. One blocker on that road is API documentation. It came up a number of times in that thread.
Now, we can already use "domain" directives with MyST. They are the tool for "manual" API documentation, where doc-strings and signatures have to be repeated in the documentation source files. For example, when this is the reST source
then with MyST we'd write
Note the four back-ticks that are needed to denote the outer directive block, and the familiar three back-ticks for the inner block. We could use indentation there, but it's not syntactically significant. The scope must be closed explicitly by repeating the opening marker. This is just one level of nesting. The more levels we have, the more back-ticks are needed. And that number decreases with increasing nesting level. Which, I would argue, is not a natural way to express the writer's intent. It is also quite ugly and, as such, not the Markdown way. In Markdown, readability counts.
Ultimately, this is due to MyST extending a Markdown syntax that was designed as an escaping mechanism. (Naturally so, I want to add. In plain Markdown, code fences are what comes closest to block-level Sphinx directives. And they degrade gracefully, see #63.) You'd usually need the four back-ticks when you write about Markdown in Markdown, namely to explain the code fence itself. That's typically only one level of nesting. I don't think there's a defendable use case for nesting beyond that.
But there is for API documentation. Personally, I never needed more than one nested level. But Jakob Andersen, in the Sphinx issue mentioned above, gives an example for C++ that already uses two. There could obviously, and legitimately, be more, because name-spaces are a thing in modern programming languages and they matter to end-users as well.
I would like to call on a witness that needs no introduction: Python. What's the most readable way to delimit nested scopes in the source? Indentation.
To be clear, I don't actually care about these domain directives. As far as I'm concerned, they are an internal representation. I only ever use Autodoc directives to document API. That's usually the DRY way to go. But MyST does not support Markdown in doc-strings yet, and the complication highlighted above also rears its head when implementing that support for Autodoc. See #228.
I consider Autodoc support the major blocker for feature parity. Not because it's the most in-demand feature (regrettably, it's not), but because it's the hardest to get right. And I believe, based on that discussion regarding the tutorial, that the Sphinx maintainers see/fear that too. They also care about domain directives much more than I do.
Autodoc is, unfortunately, tightly coupled with not only Sphinx, but with the reST syntax as well. Furthermore, its code covers many special cases, so providing test fixtures for a second mark-up language is a challenge. (Unless there's an automated solution somehow leveraging rst-to-myst.) Autodoc also sees a constant stream of bug reports and feature requests, leading to upstream code changes. To keep the maintenance burden low on the MyST side of things, the Autodoc extension would have to be decoupled as much as possible from the parser specifics. Ideally, upstream.
That task would be considerably simplified if MyST had an indentation-based syntax for Sphinx directives. Not only because abstractions are hard, but also because, as I'm convinced, it is the right thing to do in an effort to convey the structure of an API. It's what a "structured" Markdown dialect needs to be competitive in that area. And might have benefits in other areas as well. In fact, other than possibly parser complexity, which I cannot judge at all, I can't think of any downsides.
I will refrain from proposing specific syntax markers, as it would distract from the crux of the matter: indentation. Essentially, the question is if the parser can support a syntax construct much like a code fence, but rather than expecting the block to be closed explicitly, it would demarcate the scope based on line indent/dedent. If feasible, I propose such a syntax be implemented, one way or another.
To quote Jakob once more, as that was the remark that got me thinking:
Is it?
Value / benefit
Reduced complexity in trying to reach feature parity with Sphinx/reST.
Implementation details
No clue, I know nothing about the parser. This could be dead in the water if Markdown-it is not on board.
Tasks to complete
The text was updated successfully, but these errors were encountered: