Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the list marker width to determine whether a list item is continued #2

Merged
merged 10 commits into from
Oct 14, 2016

Conversation

PhrozenByte
Copy link

@PhrozenByte PhrozenByte commented Oct 13, 2016

See erusev#439 (comment)

This PR actually bases on 8965c78, see PhrozenByte/parsedown@bdf537e...aidantwoods/patch-4 for the "real" diff.

@aidantwoods
Copy link
Owner

These changes definitely look much better! Anything you can do about data set #244? Looks like that would be a regression from the current master.

screen shot 2016-10-13 at 23 11 40

screen shot 2016-10-13 at 23 10 22

screen shot 2016-10-13 at 23 10 36


if (preg_match('/^('.$pattern.'[ ]+)(.*)/', $Line['text'], $matches))
if (preg_match('/^('.$pattern.'([ ]+|$))(.*)/', $Line['text'], $matches))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess that allowing the string to end, instead of a space being present in the regex here would be the cause of the failure mentioned above? (in #2 (comment))

@PhrozenByte
Copy link
Author

PhrozenByte commented Oct 13, 2016

Yeah, this one is a bit tricky... I've absolutely no idea how to fix it. We really must match lines with a list marker and neither a following whitespace nor actual content - this is the essence of rule 3 (you'll have to scroll down to rule 3), see e.g. example #239 (and many more).

It's important to say that Parsedown never intentionally passed example #246, it passed it just because Parsedown didn't respect said rule 3 at all. Passing example #​246 was just an side-effect of this. It is strongly related to example #265 and #​266. See the paragraphs above example #​265 for an explanation. It is the result from the following statement of rule 1:

Exceptions: When the list item interrupts a paragraph—that is, when it starts on a line that would otherwise count as paragraph continuation text—then (a) the lines Ls must not begin with a blank line, and (b) if the list item is ordered, the start number must be 1.

However, I'm not so sure about the importance of this issue... Parsedown violates CommonMark in so many very obvious and important ways, that we should take care of those issues first. This is one of the rather "obscure" spec details which are IMHO not so important. We should take care of this after we've made Parsedown practically (what doesn't necessarily mean "completely") CommonMark compliant.

So, sure, you're completely right, this is a regression, but I don't think that this impacts even a single user.

@aidantwoods
Copy link
Owner

Okay – I figured it might have been one of those obscure ones 🙃 just wanted to make sure it wasn't an unintentional breakage (breaking something Parsedown didn't really support, to fix way more tests is of course fine!)

I'd struggle to think of a way of fixing that test without introducing the concept of a line lookahead. (Since we need to decide whether to interupt based on the content of the next line and the following line – both have to exist, and only the next may be blank).
I don't think a line lookahead should be too difficult (since Parsedown doesn't recursively parse the main lines object), but that is probably one for another PR.

@aidantwoods aidantwoods merged commit 67e454e into aidantwoods:patch-4 Oct 14, 2016
@aidantwoods
Copy link
Owner

aidantwoods commented Oct 14, 2016

@PhrozenByte

This fixes it
screen shot 2016-10-14 at 18 20 32
screen shot 2016-10-14 at 13 46 07
screen shot 2016-10-14 at 13 35 56
screen shot 2016-10-14 at 13 36 05

@PhrozenByte
Copy link
Author

Yeah, looks very promising, great idea! 👍 However, I don't think that the check

! isset($this->lines[$this->pointer+1]) or chop($this->lines[$this->pointer+1]) === ''

is necessary. I think you've added this because of the following statement in the CommonMark specs, did you?

(a) the lines Ls must not begin with a blank line

This relates to rule 3, i.e. a line with just the list marker and no content, not a empty line after the list marker. Otherwise the following would still be parsed as list:

foo
1.
bar

You furthermore didn't consider the following exception:

(b) if the list item is ordered, the start number must be 1

It should look like the following in the end:

diff --git a/Parsedown.php b/Parsedown.php
index de80f3f..b9c05ac 100644
--- a/Parsedown.php
+++ b/Parsedown.php
@@ -500,7 +500,7 @@ class Parsedown
     #
     # List

-    protected function blockList($Line)
+    protected function blockList($Line, $CurrentBlock = null)
     {
         list($name, $pattern) = $Line['text'][0] <= '-' ? array('ul', '[*+-]') : array('ol', '[0-9]{1,9}[.\)]');

@@ -533,20 +533,40 @@ class Parsedown
                 ),
             );

+            $listStart = 1;
+            $isBlankLine = (trim($matches[3]) === '');
+
             if($name === 'ol') 
             {
-                $listStart = ltrim(strstr($matches[1], $Block['data']['markerType'], true), '0') ?: '0';
-                
-                if($listStart !== '1')
+                $listStart = intval(ltrim(strstr($matches[1], $Block['data']['markerType'], true), '0'));
+
+                if($listStart !== 1)
                 {
                     $Block['element']['attributes'] = array('start' => $listStart);
                 }
             }

+            if (
+                (
+                    isset($CurrentBlock) and ! isset($CurrentBlock['interrupted'])
+                )
+                and
+                (
+                    ! isset($CurrentBlock['type']) or $CurrentBlock['type'] === 'p'
+                )
+                and
+                (
+                    $isBlankLine or ($name === 'ol' and $listStart !== 1)
+                )
+            )
+            {
+                return null;
+            }
+
             $Block['li'] = array(
                 'name' => 'li',
                 'handler' => 'li',
-                'text' => !empty($matches[3]) ? array($matches[3]) : array(),
+                'text' => ! $isBlankLine ? array($matches[3]) : array(),
             );

             $Block['element']['text'] []= & $Block['li'];

(Note the additional changes on line 541 and 569)
(Am I the only one who thinks that this code styling makes a actually pretty simple if unnecessary huge?)

This way Parsedown even passes example #265 and #​266. 🎉

@aidantwoods
Copy link
Owner

! isset($this->lines[$this->pointer+1]) or chop($this->lines[$this->pointer+1]) === ''

is because of http://spec.commonmark.org/0.26/#example-246

However, an empty list item cannot interrupt a paragraph

Hence the lookahead.
One lookahead should be sufficient because http://spec.commonmark.org/0.26/#example-241

A list item can begin with at most one blank line.

I'm happy to combine this with your changes though, so we can solve all three!

@aidantwoods
Copy link
Owner

(Am I the only one who thinks that this code styling makes a actually pretty simple if unnecessary huge?)

I'm happy to trim lines like

(
     ! isset($CurrentBlock['type']) or $CurrentBlock['type'] === 'p'
)

down to something like

(! isset($CurrentBlock['type']) or $CurrentBlock['type'] === 'p')

Though, having 6 different if conditions on one line isn't all that readable ;p

Logic behind the existing formatting is that you can see the nesting depth clearly, and the bracket styling matches the rest of the code (there were no examples of if statements with lots of conditions that I could base the styling on)

@PhrozenByte
Copy link
Author

! isset($this->lines[$this->pointer+1]) or chop($this->lines[$this->pointer+1]) === ''

is because of http://spec.commonmark.org/0.26/#example-246

However, an empty list item cannot interrupt a paragraph

Hence the lookahead.
One lookahead should be sufficient because http://spec.commonmark.org/0.26/#example-241

A list item can begin with at most one blank line.

I'm happy to combine this with your changes though, so we can solve all three!

No, that's not correct. At first I understood it the same way as you do, but after looking at the examples and testing it with the reference parser (click me), it became clear that they actually mean

-

with "blank line". The wording is very misleading (:confused:), but this is the only interpretation that matches the results of example #241: If they hadn't count the lone - as "blank line", the actually empty line 2 would have been the "at most one blank line", so that foo in line 3 would have become the first and only item of the list. However, that's not the case, the lone - is the first blank line and the actually empty line 2 is the second blank line, so the result is a empty list and a paragraph with foo.

You can furthermore use CommonMark's reference parser to be 100% sure:

foo
*
bar

becomes

<p>foo
*
bar</p>

and not

<p>foo</p>
<ul>
<li></li>
</ul>
<p>bar</p>

That wouldn't be the case with the lookahead. See http://spec.commonmark.org/dingus/?text=foo%0A*%0Abar


With

(Am I the only one who thinks that this code styling makes a actually pretty simple if unnecessary huge?)

I actually didn't want to say that we should change this. Parsedown is @erusev's project, he decides what coding standards we should follow with our PRs and if he wants it this way, we should do it this way. Personally I prefer http://www.php-fig.org/, however, a discussion about personal preferences probably won't take us much further 😆

@aidantwoods
Copy link
Owner

No, that's not correct. At first I understood it the same way as you do, but after looking at the examples and testing it with the reference parser (click me), it became clear that they actually mean

-

with "blank line".

Ah I see, I was of the impression (from the wording) that because the following is a valid list (non-empty) item
screen shot 2016-10-14 at 20 04 17
then this would count as a non-empty list (provided there is no more than one blank line)
screen shot 2016-10-14 at 20 04 03

Looks like the wording is confusing :/

I actually didn't want to say that we should change this. Parsedown is @erusev's project, he decides what coding standards we should follow with our PRs and if he wants it this way, we should do it this way.

Oh, I thought I was the one who started formatting ifs like that (spreading out the conditions over multiple lines and nesting them with indents). Couldn't see any other examples in Parsedown where if had more than a few conditions (so they're kept to one line elsewhere).
The huge statements are my effort to match the styling of the surrounding code, I'm not sure as to what @erusev's preferences are for ifs with lots of conditions though.

@aidantwoods
Copy link
Owner

I guess the lookahead isn't needed in this case, if CommonMark only cares that the existing line is an empty item

aidantwoods added a commit that referenced this pull request Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants