Document how to indent nest directives #493

john-hen · 2022-01-07T19:49:01Z

Description / Summary

I recently came across this (resolved) Sphinx issue: sphinx-doc/sphinx#9165. It was (no need to read or re-read that now) about the tutorial aimed at beginners. The original proposal was to use MyST as the mark-up language, given that newcomers would tend to be more familiar with Markdown rather than reStructuredText. However, that didn't come to pass, the tutorial ended up using reST. Namely because MyST is but a Sphinx extension and not officially supported… yet. I would love for that to change… eventually.

Arguably, for MyST Markdown to ever be on equal footing with Sphinx reST, it would need to reach feature parity. One blocker on that road is API documentation. It came up a number of times in that thread.

Now, we can already use "domain" directives with MyST. They are the tool for "manual" API documentation, where doc-strings and signatures have to be repeated in the documentation source files. For example, when this is the reST source

.. py:class:: Class()
   :module: module

   Doc-string of the class.

   .. py:method:: Class.method()
      :module: module

      Doc-string of the method.

then with MyST we'd write

````{py:class} Class()

Doc-string of the class.

```{py:method} Class.method()
:module: module

Doc-string of the method.
```
````

Note the four back-ticks that are needed to denote the outer directive block, and the familiar three back-ticks for the inner block. We could use indentation there, but it's not syntactically significant. The scope must be closed explicitly by repeating the opening marker. This is just one level of nesting. The more levels we have, the more back-ticks are needed. And that number decreases with increasing nesting level. Which, I would argue, is not a natural way to express the writer's intent. It is also quite ugly and, as such, not the Markdown way. In Markdown, readability counts.

Ultimately, this is due to MyST extending a Markdown syntax that was designed as an escaping mechanism. (Naturally so, I want to add. In plain Markdown, code fences are what comes closest to block-level Sphinx directives. And they degrade gracefully, see #63.) You'd usually need the four back-ticks when you write about Markdown in Markdown, namely to explain the code fence itself. That's typically only one level of nesting. I don't think there's a defendable use case for nesting beyond that.

But there is for API documentation. Personally, I never needed more than one nested level. But Jakob Andersen, in the Sphinx issue mentioned above, gives an example for C++ that already uses two. There could obviously, and legitimately, be more, because name-spaces are a thing in modern programming languages and they matter to end-users as well.

I would like to call on a witness that needs no introduction: Python. What's the most readable way to delimit nested scopes in the source? Indentation.

To be clear, I don't actually care about these domain directives. As far as I'm concerned, they are an internal representation. I only ever use Autodoc directives to document API. That's usually the DRY way to go. But MyST does not support Markdown in doc-strings yet, and the complication highlighted above also rears its head when implementing that support for Autodoc. See #228.

I consider Autodoc support the major blocker for feature parity. Not because it's the most in-demand feature (regrettably, it's not), but because it's the hardest to get right. And I believe, based on that discussion regarding the tutorial, that the Sphinx maintainers see/fear that too. They also care about domain directives much more than I do.

Autodoc is, unfortunately, tightly coupled with not only Sphinx, but with the reST syntax as well. Furthermore, its code covers many special cases, so providing test fixtures for a second mark-up language is a challenge. (Unless there's an automated solution somehow leveraging rst-to-myst.) Autodoc also sees a constant stream of bug reports and feature requests, leading to upstream code changes. To keep the maintenance burden low on the MyST side of things, the Autodoc extension would have to be decoupled as much as possible from the parser specifics. Ideally, upstream.

That task would be considerably simplified if MyST had an indentation-based syntax for Sphinx directives. Not only because abstractions are hard, but also because, as I'm convinced, it is the right thing to do in an effort to convey the structure of an API. It's what a "structured" Markdown dialect needs to be competitive in that area. And might have benefits in other areas as well. In fact, other than possibly parser complexity, which I cannot judge at all, I can't think of any downsides.

I will refrain from proposing specific syntax markers, as it would distract from the crux of the matter: indentation. Essentially, the question is if the parser can support a syntax construct much like a code fence, but rather than expecting the block to be closed explicitly, it would demarcate the scope based on line indent/dedent. If feasible, I propose such a syntax be implemented, one way or another.

To quote Jakob once more, as that was the remark that got me thinking:

Is it too late for MyST to learn a better syntax for directives?

Is it?

Value / benefit

Reduced complexity in trying to reach feature parity with Sphinx/reST.

Implementation details

No clue, I know nothing about the parser. This could be dead in the water if Markdown-it is not on board.

Tasks to complete

Comment on feasibility.

chrisjsewell · 2022-01-07T20:31:59Z

not the Markdown way ... MyST extending a Markdown syntax .. they degrade gracefully

I guess there is two "competing" aspects here: the "Markdown way" for users (ie.g. writability, readability), and the "Mardown way" for parsers (e.g. the commonmark spec)

For sure, if you start using the RST directive like syntax with a standard Markdown parser/renderer, you are going to get a mess.
(In some sense, the https://spec.commonmark.org/0.30/#indented-code-blocks syntax was a bit of a misstep by Markdown, as fenced blocks are much more explicit)

I'd also note, that with the original Markdown parsers myst-parser was using, it probably would have been impossible, since syntax extensions were often based on nasty regex matching etc.
Now, with https://github.com/ExecutableBookProject/markdown-it-py, the syntax "plugin system" is much nicer, and it might be within the realm of possibility to implement such a block-level syntax plugin

Personally, I never needed more than one nested level.

This is kinda the pro/con as well: for a lot of use cases you don't need the nesting, and then IMO it becomes a pain:
for example, I hate having to write loads of indented code directives in RST, I prefer this:

```python
from a import b
def func(x):
    return b(x)
```

to this

.. code-block:: python

    from a import b
    def func(x):
        return b(x)

and it is not known syntax to Markdown users.

As a slight aside, you can also do some interesting block syntax nesting

> ```python
>  print("Hello world")
>  ```

- ```python
  print("Hello world")
  ```

I guess perhaps you might even want some kind of hybrid : use backticks for the first directive level, with no indentation, then some kind of indentation syntax for nested directives

```{note}
.. admonition::

    Some text

    .. warning::

        Some other text
```

don't know; I'm not promising anything 😅, but open to suggestions

chrisjsewell · 2022-01-07T20:44:01Z

If anyone is game, they can always try writing a markdown-it-py plugin (see also https://github.com/executablebooks/mdit-py-plugins).
The great thing with using markdown-it-py is that, if you did create a working syntax extension, then it is very easy to port this to https://github.com/markdown-it/markdown-it and get a cousin JavaScript implementation (that could be used e.g. with https://github.com/executablebooks/myst-vs-code and the jupyterlab plugins we are working on)

chrisjsewell · 2022-01-07T21:08:51Z

While I'm at, one extra "annoyance" from a parsing perspective for RST directives, is that you cannot a priori "know" the structure of the syntax, without first dynamically retrieving the directive and looking at its spec.

see https://github.com/live-clones/docutils/blob/cc65c243ac3d871671920d939d48f5734d964bb3/docutils/docutils/parsers/rst/__init__.py#L208 and https://github.com/live-clones/docutils/blob/cc65c243ac3d871671920d939d48f5734d964bb3/docutils/docutils/parsers/rst/states.py#L2166

For example, with:

class MyDirective(Directive):
    has_content = True

.. my-directive:: some text

then "some text" will be parsed as the body text

But if you were to use

class MyDirective(Directive):
    optional_arguments = 1
    has_content = True

Now "some text" is treated as the argument text.

This is "nice" from a user perspective, because you can create terser syntax, i.e. you don't have to do:

.. my-directive::

    some text

But it is horrible from a parser perspective, because you cannot parse the syntax in "isolation", you have to load every directive that you might be using up-front (e.g. including all sphinx extensions)

(basically I would want a syntax that does not have this "dynamism"; an argument is an argument, and the body is the body)

john-hen · 2022-01-07T23:33:10Z

Thanks a lot for your comments, Chris.

I hate having to write loads of indented code directives in RST, I prefer this […] to this […]

So do I.

it is not known syntax to Markdown users

Not this syntax per se, but indented code blocks are. I, you, and most people these days, do however prefer explicit code fences. I don't want to get rid of them. Code fences are fine the way they are. The only complication they present in this context, is that their syntax may overlap with the indentation-based one. In other words, the latter may need a different syntax marker, something other than triple back-ticks, so that the parser can keep up.

perhaps you might even want some kind of hybrid

I think that's a serious contender: Parse reST-like directive syntax, but only inside MyST-style Sphinx directives. It would not be the perfect solution, as it certainly comes with a documentation overhead. ("Here, the directive may have reST-like syntax, but content is parsed as Markdown, regardless. Whereas eval-rst requires reST-like syntax and content must be reST as well.") But it cannot be disregarded, as it would leave anything outside the first-level code fences completely unaffected.

I'm not promising anything

No need. 👍 And even if you did, no rush.

But it is horrible from a parser perspective

I can see how that's annoying: there is no direct translation of optional arguments and body content. The parser has to figure that out on a per-directive basis. And indentation may come into play.

That's true for the reST parser. What if the MyST parser just enforced the distinction? Like, if "some text" is in-line, pass it as an optional argument, no body content. If the directive does not accept that, let it fail. User has to adjust the source. MyST can be stricter that Sphinx/reST, that doesn't get in the way of feature parity.

chrisjsewell · 2022-01-10T19:37:26Z

actually, I would note here that you can already use indentation in your directive cells, providing:

You "switch" to the alternative form of fence markers ` <-> ~
You don't indent greater than 3 spaces, per nested indentation

For example,

```{note}
   ~~~{note}
      ~~~{note}
        ~~~{important}
        Hallo World!
        ~~~
      ~~~
   ~~~
```

~~~{note}
   ```{warning}
   Hey again!
   ```
~~~

gives you

@john-hen do you think that is sufficient? Perhaps we should document this

If you wished to remove restriction (2), with markdown-it you would supersede the indented 'code' rule with a modified fence, that ignored the "max 3 spaces" rule:

from markdown_it import MarkdownIt
from markdown_it.rules_block import StateBlock


def _fence(state: StateBlock, startLine: int, endLine: int, silent: bool):

    haveEndMarker = False
    pos = state.bMarks[startLine] + state.tShift[startLine]
    maximum = state.eMarks[startLine]

    # COMMENTING OUT MAX 3 CHARS INDENTATION
    # if it's indented more than 3 spaces, it should be a code block
    # if state.sCount[startLine] - state.blkIndent >= 4:
    #     return False

    if pos + 3 > maximum:
       return False

    marker = state.srcCharCode[pos]

    # /* ~ */  /* ` */
    if marker != 0x7E and marker != 0x60:
        return False

    # scan marker length
    mem = pos
    pos = state.skipChars(pos, marker)

    length = pos - mem

    if length < 3:
        return False

    markup = state.src[mem:pos]
    params = state.src[pos:maximum]

    # /* ` */
    if marker == 0x60:
        if chr(marker) in params:
            return False

    # Since start is found, we can report success here in validation mode
    if silent:
        return True

    # search end of block
    nextLine = startLine

    while True:
        nextLine += 1
        if nextLine >= endLine:
            # unclosed block should be autoclosed by end of document.
            # also block seems to be autoclosed by end of parent
            break

        pos = mem = state.bMarks[nextLine] + state.tShift[nextLine]
        maximum = state.eMarks[nextLine]

        if pos < maximum and state.sCount[nextLine] < state.blkIndent:
            # non-empty line with negative indent should stop the list:
            # - ```
            #  test
            break

        if state.srcCharCode[pos] != marker:
            continue

        if state.sCount[nextLine] - state.blkIndent >= 4:
            # closing fence should be indented less than 4 spaces
            continue

        pos = state.skipChars(pos, marker)

        # closing code fence must be at least as long as the opening one
        if pos - mem < length:
            continue

        # make sure tail has spaces only
        pos = state.skipSpaces(pos)

        if pos < maximum:
            continue

        haveEndMarker = True
        # found!
        break

    # If a fence has heading spaces, they should be removed from its inner block
    length = state.sCount[startLine]

    state.line = nextLine + (1 if haveEndMarker else 0)

    token = state.push("fence", "code", 0)
    token.info = params
    token.content = state.getLines(startLine + 1, nextLine, length, True)
    token.markup = markup
    token.map = [startLine, state.line]

    return True


def new_fence(md: MarkdownIt):
    # assess before indented code
    md.block.ruler.before(
        "code", "new_fence", _fence, {"alt": ["paragraph", "reference", "blockquote", "list"]}
    )


md = MarkdownIt("commonmark").disable("code").use(new_fence)
tokens = md.parse("""

    ```{note}
    more than 3 spaces
    ```

"""
)
print(tokens)

[Token(type='fence', tag='code', nesting=0, attrs={}, map=[2, 6], level=0, children=None, content='more than 3 spaces\n```\n\n', markup='```', info='{note}', meta={}, block=True, hidden=False)]

john-hen · 2022-01-10T21:37:34Z

Perfect! 😃 I did not know about this. Renders this entire issue moot.

Tested with the example from the original post, and it works like a charm:

~~~{py:class} Class()
   :module: module

   Doc-string of the class.

   ~~~{py:method} Class.method()
      :module: module

      Doc-string of the method.

Yes, it should be mentioned somewhere in the documentation. Doesn't have to be the most prominent place, since the code fences are the more familiar syntax to Markdown users and do cover almost all use cases. This will only be relevant for people actually using those domain directives.

I see no problem with the fact that the indentation needs to be three spaces. That's also what Autodoc uses.

chrisjsewell · 2022-01-10T21:39:26Z

I did not know about this

yeh I don't know why I didn't think of it earlier either lol

chrisjsewell · 2022-01-10T21:40:49Z

gonna re-open, to remember to put this in the docs 👍

john-hen added the enhancement New feature or request label Jan 7, 2022

chrisjsewell added discussion no fixed close condition syntax descisions on syntax formats and removed enhancement New feature or request labels Jan 7, 2022

chrisjsewell added documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Jan 10, 2022

chrisjsewell mentioned this issue Jan 10, 2022

Indent MyST Markdown examples executablebooks/sphinx-design#52

Open

john-hen closed this as completed Jan 10, 2022

chrisjsewell changed the title ~~Add an indentation-based syntax to nest Sphinx directives~~ Document how to indent nest directives Jan 10, 2022

chrisjsewell added the documentation Improvements or additions to documentation label Jan 10, 2022

chrisjsewell reopened this Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document how to indent nest directives #493

Document how to indent nest directives #493

john-hen commented Jan 7, 2022 •

edited

Loading

chrisjsewell commented Jan 7, 2022 •

edited

Loading

chrisjsewell commented Jan 7, 2022 •

edited

Loading

chrisjsewell commented Jan 7, 2022 •

edited

Loading

john-hen commented Jan 7, 2022

chrisjsewell commented Jan 10, 2022 •

edited

Loading

john-hen commented Jan 10, 2022

chrisjsewell commented Jan 10, 2022

chrisjsewell commented Jan 10, 2022

Document how to indent nest directives #493

Document how to indent nest directives #493

Comments

john-hen commented Jan 7, 2022 • edited Loading

Description / Summary

Value / benefit

Implementation details

Tasks to complete

chrisjsewell commented Jan 7, 2022 • edited Loading

chrisjsewell commented Jan 7, 2022 • edited Loading

chrisjsewell commented Jan 7, 2022 • edited Loading

john-hen commented Jan 7, 2022

chrisjsewell commented Jan 10, 2022 • edited Loading

john-hen commented Jan 10, 2022

chrisjsewell commented Jan 10, 2022

chrisjsewell commented Jan 10, 2022

john-hen commented Jan 7, 2022 •

edited

Loading

chrisjsewell commented Jan 7, 2022 •

edited

Loading

chrisjsewell commented Jan 7, 2022 •

edited

Loading

chrisjsewell commented Jan 7, 2022 •

edited

Loading

chrisjsewell commented Jan 10, 2022 •

edited

Loading