Use the list marker width to determine whether a list item is continued #2

PhrozenByte · 2016-10-13T19:08:57Z

This PR actually bases on 8965c78, see PhrozenByte/parsedown@bdf537e...aidantwoods/patch-4 for the "real" diff.

…k tests" This reverts commit 6973302.

This reverts commit 0a43799.

This reverts commit 2db3199.

See CommonMark spec examples [erusev#226](http://spec.commonmark.org/0.26/#example-226) to erusev#229

According to the CommonMark specs ([list items](http://spec.commonmark.org/0.26/#list-items), rule 3), list items starting with a blank line basically behave like as if the \n doesn't exist. Also see example [erusev#241](http://spec.commonmark.org/0.26/#example-241).

This basically represents [list item parsing](http://spec.commonmark.org/0.26/#list-items), rule 1 of the CommonMark specs.

Subsequent list items which aren't indented sufficiently are treated as part of the original list, see CommonMark spec example [erusev#256](http://spec.commonmark.org/0.26/#example-256).

aidantwoods · 2016-10-13T22:18:05Z

These changes definitely look much better! Anything you can do about data set #244? Looks like that would be a regression from the current master.

aidantwoods · 2016-10-13T22:20:28Z

Parsedown.php


-        if (preg_match('/^('.$pattern.'[ ]+)(.*)/', $Line['text'], $matches))
+        if (preg_match('/^('.$pattern.'([ ]+|$))(.*)/', $Line['text'], $matches))


I'd guess that allowing the string to end, instead of a space being present in the regex here would be the cause of the failure mentioned above? (in #2 (comment))

PhrozenByte · 2016-10-13T22:50:06Z

Yeah, this one is a bit tricky... I've absolutely no idea how to fix it. We really must match lines with a list marker and neither a following whitespace nor actual content - this is the essence of rule 3 (you'll have to scroll down to rule 3), see e.g. example #239 (and many more).

It's important to say that Parsedown never intentionally passed example #246, it passed it just because Parsedown didn't respect said rule 3 at all. Passing example #246 was just an side-effect of this. It is strongly related to example #265 and #266. See the paragraphs above example #265 for an explanation. It is the result from the following statement of rule 1:

Exceptions: When the list item interrupts a paragraph—that is, when it starts on a line that would otherwise count as paragraph continuation text—then (a) the lines Ls must not begin with a blank line, and (b) if the list item is ordered, the start number must be 1.

However, I'm not so sure about the importance of this issue... Parsedown violates CommonMark in so many very obvious and important ways, that we should take care of those issues first. This is one of the rather "obscure" spec details which are IMHO not so important. We should take care of this after we've made Parsedown practically (what doesn't necessarily mean "completely") CommonMark compliant.

So, sure, you're completely right, this is a regression, but I don't think that this impacts even a single user.

aidantwoods · 2016-10-14T07:28:31Z

Okay – I figured it might have been one of those obscure ones 🙃 just wanted to make sure it wasn't an unintentional breakage (breaking something Parsedown didn't really support, to fix way more tests is of course fine!)

I'd struggle to think of a way of fixing that test without introducing the concept of a line lookahead. (Since we need to decide whether to interupt based on the content of the next line and the following line – both have to exist, and only the next may be blank).
I don't think a line lookahead should be too difficult (since Parsedown doesn't recursively parse the main lines object), but that is probably one for another PR.

aidantwoods · 2016-10-14T12:48:51Z

@PhrozenByte

This fixes it

PhrozenByte · 2016-10-14T15:42:46Z

Yeah, looks very promising, great idea! 👍 However, I don't think that the check

! isset($this->lines[$this->pointer+1]) or chop($this->lines[$this->pointer+1]) === ''

is necessary. I think you've added this because of the following statement in the CommonMark specs, did you?

(a) the lines Ls must not begin with a blank line

This relates to rule 3, i.e. a line with just the list marker and no content, not a empty line after the list marker. Otherwise the following would still be parsed as list:

foo
1.
bar

You furthermore didn't consider the following exception:

(b) if the list item is ordered, the start number must be 1

It should look like the following in the end:

diff --git a/Parsedown.php b/Parsedown.php
index de80f3f..b9c05ac 100644
--- a/Parsedown.php
+++ b/Parsedown.php
@@ -500,7 +500,7 @@ class Parsedown
     #
     # List

-    protected function blockList($Line)
+    protected function blockList($Line, $CurrentBlock = null)
     {
         list($name, $pattern) = $Line['text'][0] <= '-' ? array('ul', '[*+-]') : array('ol', '[0-9]{1,9}[.\)]');

@@ -533,20 +533,40 @@ class Parsedown
                 ),
             );

+            $listStart = 1;
+            $isBlankLine = (trim($matches[3]) === '');
+
             if($name === 'ol') 
             {
-                $listStart = ltrim(strstr($matches[1], $Block['data']['markerType'], true), '0') ?: '0';
-                
-                if($listStart !== '1')
+                $listStart = intval(ltrim(strstr($matches[1], $Block['data']['markerType'], true), '0'));
+
+                if($listStart !== 1)
                 {
                     $Block['element']['attributes'] = array('start' => $listStart);
                 }
             }

+            if (
+                (
+                    isset($CurrentBlock) and ! isset($CurrentBlock['interrupted'])
+                )
+                and
+                (
+                    ! isset($CurrentBlock['type']) or $CurrentBlock['type'] === 'p'
+                )
+                and
+                (
+                    $isBlankLine or ($name === 'ol' and $listStart !== 1)
+                )
+            )
+            {
+                return null;
+            }
+
             $Block['li'] = array(
                 'name' => 'li',
                 'handler' => 'li',
-                'text' => !empty($matches[3]) ? array($matches[3]) : array(),
+                'text' => ! $isBlankLine ? array($matches[3]) : array(),
             );

             $Block['element']['text'] []= & $Block['li'];

(Note the additional changes on line 541 and 569)
(Am I the only one who thinks that this code styling makes a actually pretty simple if unnecessary huge?)

This way Parsedown even passes example #265 and #266. 🎉

aidantwoods · 2016-10-14T17:09:54Z

! isset($this->lines[$this->pointer+1]) or chop($this->lines[$this->pointer+1]) === ''

is because of http://spec.commonmark.org/0.26/#example-246

However, an empty list item cannot interrupt a paragraph

Hence the lookahead.
One lookahead should be sufficient because http://spec.commonmark.org/0.26/#example-241

A list item can begin with at most one blank line.

I'm happy to combine this with your changes though, so we can solve all three!

aidantwoods · 2016-10-14T17:16:47Z

(Am I the only one who thinks that this code styling makes a actually pretty simple if unnecessary huge?)

I'm happy to trim lines like

(
     ! isset($CurrentBlock['type']) or $CurrentBlock['type'] === 'p'
)

down to something like

(! isset($CurrentBlock['type']) or $CurrentBlock['type'] === 'p')

Though, having 6 different if conditions on one line isn't all that readable ;p

Logic behind the existing formatting is that you can see the nesting depth clearly, and the bracket styling matches the rest of the code (there were no examples of if statements with lots of conditions that I could base the styling on)

PhrozenByte · 2016-10-14T17:52:38Z

! isset($this->lines[$this->pointer+1]) or chop($this->lines[$this->pointer+1]) === ''

is because of http://spec.commonmark.org/0.26/#example-246

However, an empty list item cannot interrupt a paragraph

Hence the lookahead.
One lookahead should be sufficient because http://spec.commonmark.org/0.26/#example-241

A list item can begin with at most one blank line.

I'm happy to combine this with your changes though, so we can solve all three!

No, that's not correct. At first I understood it the same way as you do, but after looking at the examples and testing it with the reference parser (click me), it became clear that they actually mean

with "blank line". The wording is very misleading (:confused:), but this is the only interpretation that matches the results of example #241: If they hadn't count the lone - as "blank line", the actually empty line 2 would have been the "at most one blank line", so that foo in line 3 would have become the first and only item of the list. However, that's not the case, the lone - is the first blank line and the actually empty line 2 is the second blank line, so the result is a empty list and a paragraph with foo.

You can furthermore use CommonMark's reference parser to be 100% sure:

foo
*
bar

becomes

<p>foo
*
bar</p>

and not

<p>foo</p>
<ul>
<li></li>
</ul>
<p>bar</p>

That wouldn't be the case with the lookahead. See http://spec.commonmark.org/dingus/?text=foo%0A*%0Abar

With

(Am I the only one who thinks that this code styling makes a actually pretty simple if unnecessary huge?)

I actually didn't want to say that we should change this. Parsedown is @erusev's project, he decides what coding standards we should follow with our PRs and if he wants it this way, we should do it this way. Personally I prefer http://www.php-fig.org/, however, a discussion about personal preferences probably won't take us much further 😆

aidantwoods · 2016-10-14T19:14:05Z

No, that's not correct. At first I understood it the same way as you do, but after looking at the examples and testing it with the reference parser (click me), it became clear that they actually mean
-
with "blank line".

Ah I see, I was of the impression (from the wording) that because the following is a valid list (non-empty) item

then this would count as a non-empty list (provided there is no more than one blank line)

Looks like the wording is confusing :/

I actually didn't want to say that we should change this. Parsedown is @erusev's project, he decides what coding standards we should follow with our PRs and if he wants it this way, we should do it this way.

Oh, I thought I was the one who started formatting ifs like that (spreading out the conditions over multiple lines and nesting them with indents). Couldn't see any other examples in Parsedown where if had more than a few conditions (so they're kept to one line elsewhere).
The huge statements are my effort to match the styling of the surrounding code, I'm not sure as to what @erusev's preferences are for ifs with lots of conditions though.

aidantwoods · 2016-10-14T19:15:26Z

I guess the lookahead isn't needed in this case, if CommonMark only cares that the existing line is an empty item

Test against PHP 7.2

PhrozenByte added 10 commits October 13, 2016 19:25

Revert "Prevent breaking remaining previously compliant CommonMarkWea…

eb853da

…k tests" This reverts commit 6973302.

Revert "Prevent failure with data set 77 in CommonMarkWeak"

e691034

This reverts commit 0a43799.

Revert "Break less previously passed CommonMarkWeak tests"

81025cd

This reverts commit 2db3199.

Fix ordered list start argument

bdf537e

See CommonMark spec examples [erusev#226](http://spec.commonmark.org/0.26/#example-226) to erusev#229

Remove unused $placeholder variable

30ff5c6

Support list items starting with indented code

1d61f90

Use the list marker width to determine whether a list item is continued

7b1529f

This basically represents [list item parsing](http://spec.commonmark.org/0.26/#list-items), rule 1 of the CommonMark specs.

Fix code formatting

a9e1163

Handle subsequent list items which aren't indented sufficiently

a3836b1

Subsequent list items which aren't indented sufficiently are treated as part of the original list, see CommonMark spec example [erusev#256](http://spec.commonmark.org/0.26/#example-256).

PhrozenByte mentioned this pull request Oct 13, 2016

Improve CommonMark mixed-marker list compliance erusev/parsedown#439

Merged

aidantwoods reviewed Oct 13, 2016

View reviewed changes

aidantwoods merged commit 67e454e into aidantwoods:patch-4 Oct 14, 2016

aidantwoods added a commit that referenced this pull request Feb 12, 2018

Merge pull request #2 from carusogabriel/patch-1

12cdae2

Test against PHP 7.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the list marker width to determine whether a list item is continued #2

Use the list marker width to determine whether a list item is continued #2

PhrozenByte commented Oct 13, 2016 •

edited

Loading

aidantwoods commented Oct 13, 2016

aidantwoods Oct 13, 2016

PhrozenByte commented Oct 13, 2016 •

edited

Loading

aidantwoods commented Oct 14, 2016

aidantwoods commented Oct 14, 2016 •

edited

Loading

PhrozenByte commented Oct 14, 2016

aidantwoods commented Oct 14, 2016

aidantwoods commented Oct 14, 2016

PhrozenByte commented Oct 14, 2016

aidantwoods commented Oct 14, 2016

aidantwoods commented Oct 14, 2016


		if (preg_match('/^('.$pattern.'[ ]+)(.*)/', $Line['text'], $matches))
		if (preg_match('/^('.$pattern.'([ ]+\|$))(.*)/', $Line['text'], $matches))

Use the list marker width to determine whether a list item is continued #2

Use the list marker width to determine whether a list item is continued #2

Conversation

PhrozenByte commented Oct 13, 2016 • edited Loading

aidantwoods commented Oct 13, 2016

aidantwoods Oct 13, 2016

Choose a reason for hiding this comment

PhrozenByte commented Oct 13, 2016 • edited Loading

aidantwoods commented Oct 14, 2016

aidantwoods commented Oct 14, 2016 • edited Loading

PhrozenByte commented Oct 14, 2016

aidantwoods commented Oct 14, 2016

aidantwoods commented Oct 14, 2016

PhrozenByte commented Oct 14, 2016

aidantwoods commented Oct 14, 2016

aidantwoods commented Oct 14, 2016

PhrozenByte commented Oct 13, 2016 •

edited

Loading

PhrozenByte commented Oct 13, 2016 •

edited

Loading

aidantwoods commented Oct 14, 2016 •

edited

Loading