Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor list parser #143

Merged
merged 4 commits into from
Apr 5, 2021
Merged

Refactor list parser #143

merged 4 commits into from
Apr 5, 2021

Conversation

wouterj
Copy link
Collaborator

@wouterj wouterj commented Mar 26, 2021

Fixes #142, fixes https://github.com/weaverryan/docs-builder/issues/70, fixes https://github.com/weaverryan/docs-builder/issues/38

Previously, lists were parsed as flat structure and the renderer took care of finding out the nested lists. That didn't really work and didn't allow subparsing the list item (e.g. directives or code blocks nested in the list). This PR refactors it to parse lists using a subparser, similarly to definition lists and directives.

I also added a little tests/sphinx utility that allows you to easily run the functional using a locally installed Sphinx: ./tests/sphinx list will render tests/Functional/tests/list/list.rst. All tests in this PR are validated using Sphinx (there were quite some tests that didn't really follow the official specs).

Todo

I wanted to get this PR ready for review as soon as possible. A few things needs to be done:

  • There are BC breaks in here when using the PHP classes, probably document them in UPGRADE.md
  • I don't know LaTex really well, it's output currently is extremely broken (also due to Twig whitespace control)
  • Test this branch on weaverryan/docs-builder, to get an extra set of eyes on whether this breaks important things.
    Done, no changes needed and all green

The second line of each enumerated list item is checked for validity:

1. This is not a list.
It's a paragraph starting with a number.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, there was a tests that tested this behavior as "being recognized as a list" (tests/Functional/tests/ordered2/ordered2.rst). However, the markup specs are very explicit about this:

The second line of each enumerated list item is checked for validity. This is to prevent ordinary paragraphs from being mistakenly interpreted as list items, when they happen to begin with text identical to enumerators. For example, this text is parsed as an ordinary paragraph:

A. Einstein was a really
smart dude.

However, ambiguity cannot be avoided if the paragraph consists of only one line. This text is parsed as an enumerated list item:

A. Einstein was a really smart dude.

If a single-line paragraph begins with text identical to an enumerator ("A.", "1.", "(b)", "I)", etc.), the first character will have to be escaped in order to have the line parsed as an ordinary paragraph:

\A. Einstein was a really smart dude.

@@ -1 +1,3 @@
Testing that this is escaped: <script>

\And escape \characters are ig\nored, but this one isn't: \\stdClass
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escape characters are not rendered according to the Sphinx parser. This was also required to make \1. Single line paragraphs starting with a number may use an escape. pass.

@@ -1,7 +0,0 @@
<p>Testing an indented list:</p>
Copy link
Collaborator Author

@wouterj wouterj Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tested indented lists to be parsed as lists. But in fact, reSt doesn't allow any whitespace before the list marker. Such whitespace (indentation) is recognized as a blockquote.

This is now tested by tests/Functional/tests/list-indented/list-indented.rst

@@ -79,8 +79,15 @@ <h1>
<p>Appendix A.4. Messages with Trace Fields</p>
<blockquote>
<hr />
<p>Received: from x.y.test by example.net via TCP with ESMTP id ABC12345 for &lt;<a href="mailto:mary@example4.net">mary@example4.net</a>&gt;; 21 Nov 1997 10:05:43 -0600
Received: from node.example by x.y.test; 21 Nov 1997 10:01:22 -0600
<dl>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm blown away by this test file. It was added to test the automatic recognition of e-mailaddresses. However, testing raw emails headers as reStructuredText seems very weird (resulting in strange things like this definition list).

@@ -43,15 +37,18 @@ public function isSpecialLine(string $line): ?string
return $letter;
}

public function isListLine(string $line, bool $isCode): bool
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the $isCode's function was here. Code blocks shouldn't be subparsed, so this should never be true. So I simply removed it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place it was used is in DocumentParser... related to lists... and you have already refactored that part. So this is fine with me - probably some old, unwanted piece related to the odd "is code" handling in DocumentParser.

lib/Parser/LineChecker.php Outdated Show resolved Hide resolved
lib/Span/SpanProcessor.php Outdated Show resolved Hide resolved
lib/Span/SpanProcessor.php Show resolved Hide resolved
lib/HTML/Renderers/ListNodeRenderer.php Outdated Show resolved Hide resolved
lib/Nodes/ListNode.php Outdated Show resolved Hide resolved
lib/Parser/LineChecker.php Outdated Show resolved Hide resolved
@wouterj wouterj force-pushed the list-body-elements branch 4 times, most recently from 175c2d6 to 918b203 Compare March 27, 2021 11:33
@wouterj
Copy link
Collaborator Author

wouterj commented Mar 27, 2021

Updated this PR. Fixed behavior when mixing different list markers and fixed the LaTex out (and checked its validness with a LaTex parser).

Later today, I'm going to reintroduce list-item.html.twig (and it's usage in both list templates). That would significantly reduce the impact of this PR on its users (this change currently breaks weaverryan/docs-builder). done

@wouterj wouterj force-pushed the list-body-elements branch 5 times, most recently from a319cd8 to ddae0e2 Compare March 27, 2021 23:38
Copy link
Collaborator

@weaverryan weaverryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is terrifying... in a good way!

This does break some BC... but all in ways to, clearly, drastically fix the list rendering. A bug 👍 from me!

lib/Nodes/ListNode.php Outdated Show resolved Hide resolved
@@ -43,15 +37,18 @@ public function isSpecialLine(string $line): ?string
return $letter;
}

public function isListLine(string $line, bool $isCode): bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place it was used is in DocumentParser... related to lists... and you have already refactored that part. So this is fine with me - probably some old, unwanted piece related to the odd "is code" handling in DocumentParser.

This refactors parsing lists using subparsers, in order to parse list contents
(like code blocks or other body elements).

It also fixed some inconsistencies between the reStructuredText Markup
specification and the behavior of this parser.
@wouterj
Copy link
Collaborator Author

wouterj commented Apr 2, 2021

This PR should be ready. I've documented the BC breaks very minimalistic, I don't really expect anyone to be hit by these.

@greg0ire
Copy link
Member

greg0ire commented Apr 3, 2021

I tried the new utility locally, it errors out like this:


Running Sphinx v3.2.1
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 75 source files that are out of date
updating environment: [new config] 75 added, 0 changed, 0 removed
reading sources... [100%] wrap/wrap
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/anchor-failure/anchor-failure.rst:1: WARNING: Unknown target name: "@anchor section".
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/comment-3/comment-3.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/css/css.rst:1: WARNING: Unknown directive type "stylesheet".

.. stylesheet:: style.css
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/div/div.rst:1: WARNING: Unknown directive type "div".

.. div:: testing
    Hello!
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/figure/figure.rst:4: WARNING: image file not readable: figure/foo.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/image-inline/image-inline.rst:3: WARNING: image file not readable: image-inline/test.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/image/image.rst:5: WARNING: Error in "image" directive:
unknown option: "title".

.. image:: other.jpg
    :title: Other
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/image/image.rst:1: WARNING: image file not readable: image/test.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/image/image.rst:: WARNING: image file not readable: image/try.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/list-alternate-syntax/list-alternate-syntax.rst:18: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/list-mix/list-mix.rst:2: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/list-mix/list-mix.rst:3: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/main-directive/main-directive.rst:3: WARNING: Unknown directive type "latex-main".

.. latex-main::
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/pretty-table-error1/pretty-table-error1.rst:8: WARNING: Blank line required after table.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/quote3/quote3.rst:2: WARNING: image file not readable: test.jpg
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/raw/raw.rst:3: WARNING: Error in "raw" directive:
1 argument(s) required, 0 supplied.

.. raw::

    <u>Underlined!</u>
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/simple-table-error1/simple-table-error1.rst:3: WARNING: Malformed table.
Text in column margin in table line 3.

=========== ========== =========
First col   Second col Third col
Second row  Other colllOther col
Third  row  Other col  Last col
=========== ========== =========
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:5: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:19: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:27: WARNING: Title underline too short.

Hi everyone.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:27: WARNING: Unexpected section title.

Hi everyone.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:31: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:39: WARNING: Title underline too short.

Testing.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:39: WARNING: Unexpected section title.

Testing.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:43: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:53: WARNING: Title underline too short.

This is a reply to your reply.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:53: WARNING: Unexpected section title.

This is a reply to your reply.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:57: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:74: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:81: WARNING: Definition list ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:97: WARNING: Unexpected section title or transition.

----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:100: WARNING: Unexpected indentation.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:102: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:103: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:105: WARNING: Unexpected indentation.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:110: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:113: WARNING: Title underline too short.

Testing.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-email-addresses/standalone-email-addresses.rst:113: WARNING: Unexpected section title.

Testing.
----
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/standalone-hyperlinks/standalone-hyperlinks.rst:91: WARNING: Unknown target name: "blah".
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/unknown-directive/unknown-directive.rst:1: WARNING: Unknown directive type "unknown-directive".

.. unknown-directive::
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/variable-wrap/variable-wrap.rst:1: WARNING: Substitution definition "test" empty or invalid.

.. |test| note::
    This is an important thing
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/variable-wrap/variable-wrap.rst:4: WARNING: Undefined substitution referenced: "test".
/home/greg/dev/doctrine-rst-parser/tests/Functional/tests/variable-wrap/variable-wrap.rst:6: WARNING: Undefined substitution referenced: "test".

Sphinx error:
master file /home/greg/dev/doctrine-rst-parser/tests/Functional/tests/.rst not found

I was looking into it because I wanted to understand if the new gitignored directory was going to be useful or if we should maybe use /tmp instead, thus avoiding rm -rf and a new gitignore entry

@wouterj
Copy link
Collaborator Author

wouterj commented Apr 5, 2021

@greg0ire I intended the new utility to be used like ./tests/sphinx list-alternate-syntax (so for the specific functional test you want).

I was looking into it because I wanted to understand if the new gitignored directory was going to be useful or if we should maybe use /tmp instead, thus avoiding rm -rf and a new gitignore entry

I put it in the project dir because I want to look at the generated HTML (that's the whole purpose of running Sphinx). Opening a file from the project dir is much easier (at least for me) than opening a tmp directory.

@greg0ire greg0ire merged commit dcaee43 into doctrine:0.4.x Apr 5, 2021
@greg0ire
Copy link
Member

greg0ire commented Apr 5, 2021

Thanks @wouterj !

@wouterj
Copy link
Collaborator Author

wouterj commented Apr 5, 2021

@greg0ire thanks for merging :)

For your information, I've just opened #145 - which is the open comment from Ryan's PR. I don't think it's feasible to fix that in any very short future, so if you would like to draft a new release with these 2 big PRs - feel free to do so :)

@greg0ire greg0ire added this to the 0.4.0 milestone Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants