Refactor list parser #143

wouterj · 2021-03-26T14:27:45Z

Fixes #142, fixes https://github.com/weaverryan/docs-builder/issues/70, fixes https://github.com/weaverryan/docs-builder/issues/38

Previously, lists were parsed as flat structure and the renderer took care of finding out the nested lists. That didn't really work and didn't allow subparsing the list item (e.g. directives or code blocks nested in the list). This PR refactors it to parse lists using a subparser, similarly to definition lists and directives.

I also added a little tests/sphinx utility that allows you to easily run the functional using a locally installed Sphinx: ./tests/sphinx list will render tests/Functional/tests/list/list.rst. All tests in this PR are validated using Sphinx (there were quite some tests that didn't really follow the official specs).

Todo

I wanted to get this PR ready for review as soon as possible. A few things needs to be done:

There are BC breaks in here when using the PHP classes, probably document them in UPGRADE.md
I don't know LaTex really well, it's output currently is extremely broken (also due to Twig whitespace control)
Test this branch on weaverryan/docs-builder, to get an extra set of eyes on whether this breaks important things.
Done, no changes needed and all green

wouterj · 2021-03-26T14:30:10Z

tests/Functional/tests/enumerated-ignored/enumerated-ignored.rst

+The second line of each enumerated list item is checked for validity:
+
+1. This is not a list.
+It's a paragraph starting with a number.


Previously, there was a tests that tested this behavior as "being recognized as a list" (tests/Functional/tests/ordered2/ordered2.rst). However, the markup specs are very explicit about this:

The second line of each enumerated list item is checked for validity. This is to prevent ordinary paragraphs from being mistakenly interpreted as list items, when they happen to begin with text identical to enumerators. For example, this text is parsed as an ordinary paragraph:

A. Einstein was a really smart dude.

However, ambiguity cannot be avoided if the paragraph consists of only one line. This text is parsed as an enumerated list item:

A. Einstein was a really smart dude.

If a single-line paragraph begins with text identical to an enumerator ("A.", "1.", "(b)", "I)", etc.), the first character will have to be escaped in order to have the line parsed as an ordinary paragraph:

\A. Einstein was a really smart dude.

wouterj · 2021-03-26T14:31:11Z

tests/Functional/tests/escape/escape.rst

@@ -1 +1,3 @@
 Testing that this is escaped: <script>
+
+\And escape \characters are ig\nored, but this one isn't: \\stdClass


Escape characters are not rendered according to the Sphinx parser. This was also required to make \1. Single line paragraphs starting with a number may use an escape. pass.

wouterj · 2021-03-26T14:32:09Z

tests/Functional/tests/indented-list/indented-list.html

@@ -1,7 +0,0 @@
-<p>Testing an indented list:</p>


This tested indented lists to be parsed as lists. But in fact, reSt doesn't allow any whitespace before the list marker. Such whitespace (indentation) is recognized as a blockquote.

This is now tested by tests/Functional/tests/list-indented/list-indented.rst

wouterj · 2021-03-26T14:33:58Z

tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.html

@@ -79,8 +79,15 @@ <h1>
 <p>Appendix A.4. Messages with Trace Fields</p>
 <blockquote>
    <hr />
-    <p>Received: from x.y.test by example.net via TCP with ESMTP id ABC12345 for &lt;<a href="mailto:mary@example4.net">mary@example4.net</a>&gt;; 21 Nov 1997 10:05:43 -0600
-Received: from node.example by x.y.test; 21 Nov 1997 10:01:22 -0600
+    <dl>


Honestly, I'm blown away by this test file. It was added to test the automatic recognition of e-mailaddresses. However, testing raw emails headers as reStructuredText seems very weird (resulting in strange things like this definition list).

wouterj · 2021-03-26T14:35:03Z

lib/Parser/LineChecker.php

@@ -43,15 +37,18 @@ public function isSpecialLine(string $line): ?string
        return $letter;
    }

-    public function isListLine(string $line, bool $isCode): bool


I'm not sure what the $isCode's function was here. Code blocks shouldn't be subparsed, so this should never be true. So I simply removed it.

The only place it was used is in DocumentParser... related to lists... and you have already refactored that part. So this is fine with me - probably some old, unwanted piece related to the odd "is code" handling in DocumentParser.

tests/Functional/tests/list-alternate-syntax/list-alternate-syntax.rst

lib/Parser/LineChecker.php

lib/Span/SpanProcessor.php

lib/HTML/Renderers/ListNodeRenderer.php

lib/Nodes/ListNode.php

lib/Parser/LineChecker.php

tests/Functional/tests/list-alternate-syntax/list-alternate-syntax.rst

wouterj · 2021-03-27T11:48:52Z

Updated this PR. Fixed behavior when mixing different list markers and fixed the LaTex out (and checked its validness with a LaTex parser).

Later today, I'm going to reintroduce list-item.html.twig (and it's usage in both list templates). That would significantly reduce the impact of this PR on its users (this change currently breaks weaverryan/docs-builder). done

weaverryan

This PR is terrifying... in a good way!

This does break some BC... but all in ways to, clearly, drastically fix the list rendering. A bug 👍 from me!

lib/Nodes/ListNode.php

weaverryan · 2021-03-29T18:31:20Z

lib/Parser/LineChecker.php

@@ -43,15 +37,18 @@ public function isSpecialLine(string $line): ?string
        return $letter;
    }

-    public function isListLine(string $line, bool $isCode): bool


The only place it was used is in DocumentParser... related to lists... and you have already refactored that part. So this is fine with me - probably some old, unwanted piece related to the odd "is code" handling in DocumentParser.

This refactors parsing lists using subparsers, in order to parse list contents (like code blocks or other body elements). It also fixed some inconsistencies between the reStructuredText Markup specification and the behavior of this parser.

wouterj · 2021-04-02T21:06:54Z

This PR should be ready. I've documented the BC breaks very minimalistic, I don't really expect anyone to be hit by these.

greg0ire · 2021-04-03T13:39:40Z

I tried the new utility locally, it errors out like this:


Running Sphinx v3.2.1
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 75 source files that are out of date
updating environment: [new config] 75 added, 0 changed, 0 removed
reading sources... [100%] wrap/wrap
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/anchor-failure/anchor-failure.rst:1: WARNING: Unknown target name: "@anchor section".
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/comment-3/comment-3.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/css/css.rst:1: WARNING: Unknown directive type "stylesheet".

.. stylesheet:: style.css
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/div/div.rst:1: WARNING: Unknown directive type "div".

.. div:: testing
    Hello!
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/figure/figure.rst:4: WARNING: image file not readable: figure/foo.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/image-inline/image-inline.rst:3: WARNING: image file not readable: image-inline/test.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/image/image.rst:5: WARNING: Error in "image" directive:
unknown option: "title".

.. image:: other.jpg
    :title: Other
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/image/image.rst:1: WARNING: image file not readable: image/test.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/image/image.rst:: WARNING: image file not readable: image/try.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/list-alternate-syntax/list-alternate-syntax.rst:18: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/list-mix/list-mix.rst:2: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/list-mix/list-mix.rst:3: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/main-directive/main-directive.rst:3: WARNING: Unknown directive type "latex-main".

.. latex-main::
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/pretty-table-error1/pretty-table-error1.rst:8: WARNING: Blank line required after table.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/quote3/quote3.rst:2: WARNING: image file not readable: test.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/raw/raw.rst:3: WARNING: Error in "raw" directive:
1 argument(s) required, 0 supplied.

.. raw::

    <u>Underlined!</u>
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/simple-table-error1/simple-table-error1.rst:3: WARNING: Malformed table.
Text in column margin in table line 3.

=========== ========== =========
First col   Second col Third col
Second row  Other colllOther col
Third  row  Other col  Last col
=========== ========== =========
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:5: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:19: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:27: WARNING: Title underline too short.

Hi everyone.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:27: WARNING: Unexpected section title.

Hi everyone.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:31: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:39: WARNING: Title underline too short.

Testing.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:39: WARNING: Unexpected section title.

Testing.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:43: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:53: WARNING: Title underline too short.

This is a reply to your reply.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:53: WARNING: Unexpected section title.

This is a reply to your reply.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:57: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:74: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:81: WARNING: Definition list ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:97: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:100: WARNING: Unexpected indentation.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:102: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:103: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:105: WARNING: Unexpected indentation.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:110: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:113: WARNING: Title underline too short.

Testing.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:113: WARNING: Unexpected section title.

Testing.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-hyperlinks/standalone-hyperlinks.rst:91: WARNING: Unknown target name: "blah".
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/unknown-directive/unknown-directive.rst:1: WARNING: Unknown directive type "unknown-directive".

.. unknown-directive::
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/variable-wrap/variable-wrap.rst:1: WARNING: Substitution definition "test" empty or invalid.

.. |test| note::
    This is an important thing
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/variable-wrap/variable-wrap.rst:4: WARNING: Undefined substitution referenced: "test".
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/variable-wrap/variable-wrap.rst:6: WARNING: Undefined substitution referenced: "test".

Sphinx error:
master file /home/greg/dev/doctrine-rst-parser/tests/Functional/tests/.rst not found

I was looking into it because I wanted to understand if the new gitignored directory was going to be useful or if we should maybe use /tmp instead, thus avoiding rm -rf and a new gitignore entry

wouterj · 2021-04-05T20:04:23Z

@greg0ire I intended the new utility to be used like ./tests/sphinx list-alternate-syntax (so for the specific functional test you want).

I was looking into it because I wanted to understand if the new gitignored directory was going to be useful or if we should maybe use /tmp instead, thus avoiding rm -rf and a new gitignore entry

I put it in the project dir because I want to look at the generated HTML (that's the whole purpose of running Sphinx). Opening a file from the project dir is much easier (at least for me) than opening a tmp directory.

greg0ire · 2021-04-05T20:41:16Z

Thanks @wouterj !

wouterj · 2021-04-05T21:24:14Z

@greg0ire thanks for merging :)

For your information, I've just opened #145 - which is the open comment from Ryan's PR. I don't think it's feasible to fix that in any very short future, so if you would like to draft a new release with these 2 big PRs - feel free to do so :)

wouterj added 2 commits March 26, 2021 15:21

Add utility to easily run Sphinx on fucntional tests

f79767d

Generalize some LineChecker methods

db985b7

wouterj commented Mar 26, 2021

View reviewed changes

tests/Functional/tests/list-alternate-syntax/list-alternate-syntax.rst Show resolved Hide resolved

wouterj commented Mar 26, 2021

View reviewed changes

lib/Parser/LineChecker.php Outdated Show resolved Hide resolved

greg0ire reviewed Mar 26, 2021

View reviewed changes

Replace escape characters in generated output

722082d

wouterj force-pushed the list-body-elements branch 4 times, most recently from 175c2d6 to 918b203 Compare March 27, 2021 11:33

wouterj force-pushed the list-body-elements branch 5 times, most recently from a319cd8 to ddae0e2 Compare March 27, 2021 23:38

weaverryan approved these changes Mar 29, 2021

View reviewed changes

Completely rewrite list parsing

5dc37a0

This refactors parsing lists using subparsers, in order to parse list contents (like code blocks or other body elements). It also fixed some inconsistencies between the reStructuredText Markup specification and the behavior of this parser.

wouterj force-pushed the list-body-elements branch from ddae0e2 to 5dc37a0 Compare April 2, 2021 21:05

greg0ire added the Improvement label Apr 5, 2021

greg0ire merged commit dcaee43 into doctrine:0.4.x Apr 5, 2021

wouterj deleted the list-body-elements branch April 5, 2021 20:47

This was referenced Apr 5, 2021

Code blocks part of a list item are not parsed #142

Closed

Minor issue with nested lists symfony-tools/docs-builder#70

Closed

greg0ire added this to the 0.4.0 milestone Apr 6, 2021

github-actions bot mentioned this pull request Apr 6, 2021

Merge release 0.4.0 into 1.0.x #146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor list parser #143

Refactor list parser #143

wouterj commented Mar 26, 2021 •

edited

Loading

wouterj Mar 26, 2021

wouterj Mar 26, 2021

wouterj Mar 26, 2021 •

edited

Loading

wouterj Mar 26, 2021

wouterj Mar 26, 2021

weaverryan Mar 29, 2021

wouterj commented Mar 27, 2021 •

edited

Loading

weaverryan left a comment

weaverryan Mar 29, 2021

wouterj commented Apr 2, 2021

greg0ire commented Apr 3, 2021 •

edited

Loading

wouterj commented Apr 5, 2021

greg0ire commented Apr 5, 2021

wouterj commented Apr 5, 2021

		@@ -1 +1,3 @@
		Testing that this is escaped: <script>

		\And escape \characters are ig\nored, but this one isn't: \\stdClass

Refactor list parser #143

Refactor list parser #143

Conversation

wouterj commented Mar 26, 2021 • edited Loading

Todo

wouterj Mar 26, 2021

Choose a reason for hiding this comment

wouterj Mar 26, 2021

Choose a reason for hiding this comment

wouterj Mar 26, 2021 • edited Loading

Choose a reason for hiding this comment

wouterj Mar 26, 2021

Choose a reason for hiding this comment

wouterj Mar 26, 2021

Choose a reason for hiding this comment

weaverryan Mar 29, 2021

Choose a reason for hiding this comment

wouterj commented Mar 27, 2021 • edited Loading

weaverryan left a comment

Choose a reason for hiding this comment

weaverryan Mar 29, 2021

Choose a reason for hiding this comment

wouterj commented Apr 2, 2021

greg0ire commented Apr 3, 2021 • edited Loading

wouterj commented Apr 5, 2021

greg0ire commented Apr 5, 2021

wouterj commented Apr 5, 2021

wouterj commented Mar 26, 2021 •

edited

Loading

wouterj Mar 26, 2021 •

edited

Loading

wouterj commented Mar 27, 2021 •

edited

Loading

greg0ire commented Apr 3, 2021 •

edited

Loading