-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the list marker width to determine whether a list item is continued #2
Use the list marker width to determine whether a list item is continued #2
Conversation
…k tests" This reverts commit 6973302.
This reverts commit 0a43799.
This reverts commit 2db3199.
See CommonMark spec examples [erusev#226](http://spec.commonmark.org/0.26/#example-226) to erusev#229
According to the CommonMark specs ([list items](http://spec.commonmark.org/0.26/#list-items), rule 3), list items starting with a blank line basically behave like as if the \n doesn't exist. Also see example [erusev#241](http://spec.commonmark.org/0.26/#example-241).
This basically represents [list item parsing](http://spec.commonmark.org/0.26/#list-items), rule 1 of the CommonMark specs.
Subsequent list items which aren't indented sufficiently are treated as part of the original list, see CommonMark spec example [erusev#256](http://spec.commonmark.org/0.26/#example-256).
|
||
if (preg_match('/^('.$pattern.'[ ]+)(.*)/', $Line['text'], $matches)) | ||
if (preg_match('/^('.$pattern.'([ ]+|$))(.*)/', $Line['text'], $matches)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd guess that allowing the string to end, instead of a space being present in the regex here would be the cause of the failure mentioned above? (in #2 (comment))
Yeah, this one is a bit tricky... I've absolutely no idea how to fix it. We really must match lines with a list marker and neither a following whitespace nor actual content - this is the essence of rule 3 (you'll have to scroll down to rule 3), see e.g. example #239 (and many more). It's important to say that Parsedown never intentionally passed example #246, it passed it just because Parsedown didn't respect said rule 3 at all. Passing example #246 was just an side-effect of this. It is strongly related to example #265 and #266. See the paragraphs above example #265 for an explanation. It is the result from the following statement of rule 1:
However, I'm not so sure about the importance of this issue... Parsedown violates CommonMark in so many very obvious and important ways, that we should take care of those issues first. This is one of the rather "obscure" spec details which are IMHO not so important. We should take care of this after we've made Parsedown practically (what doesn't necessarily mean "completely") CommonMark compliant. So, sure, you're completely right, this is a regression, but I don't think that this impacts even a single user. |
Okay – I figured it might have been one of those obscure ones 🙃 just wanted to make sure it wasn't an unintentional breakage (breaking something Parsedown didn't really support, to fix way more tests is of course fine!) I'd struggle to think of a way of fixing that test without introducing the concept of a line lookahead. (Since we need to decide whether to interupt based on the content of the next line and the following line – both have to exist, and only the next may be blank). |
Yeah, looks very promising, great idea! 👍 However, I don't think that the check ! isset($this->lines[$this->pointer+1]) or chop($this->lines[$this->pointer+1]) === '' is necessary. I think you've added this because of the following statement in the CommonMark specs, did you?
This relates to rule 3, i.e. a line with just the list marker and no content, not a empty line after the list marker. Otherwise the following would still be parsed as list: foo
1.
bar You furthermore didn't consider the following exception:
It should look like the following in the end: diff --git a/Parsedown.php b/Parsedown.php
index de80f3f..b9c05ac 100644
--- a/Parsedown.php
+++ b/Parsedown.php
@@ -500,7 +500,7 @@ class Parsedown
#
# List
- protected function blockList($Line)
+ protected function blockList($Line, $CurrentBlock = null)
{
list($name, $pattern) = $Line['text'][0] <= '-' ? array('ul', '[*+-]') : array('ol', '[0-9]{1,9}[.\)]');
@@ -533,20 +533,40 @@ class Parsedown
),
);
+ $listStart = 1;
+ $isBlankLine = (trim($matches[3]) === '');
+
if($name === 'ol')
{
- $listStart = ltrim(strstr($matches[1], $Block['data']['markerType'], true), '0') ?: '0';
-
- if($listStart !== '1')
+ $listStart = intval(ltrim(strstr($matches[1], $Block['data']['markerType'], true), '0'));
+
+ if($listStart !== 1)
{
$Block['element']['attributes'] = array('start' => $listStart);
}
}
+ if (
+ (
+ isset($CurrentBlock) and ! isset($CurrentBlock['interrupted'])
+ )
+ and
+ (
+ ! isset($CurrentBlock['type']) or $CurrentBlock['type'] === 'p'
+ )
+ and
+ (
+ $isBlankLine or ($name === 'ol' and $listStart !== 1)
+ )
+ )
+ {
+ return null;
+ }
+
$Block['li'] = array(
'name' => 'li',
'handler' => 'li',
- 'text' => !empty($matches[3]) ? array($matches[3]) : array(),
+ 'text' => ! $isBlankLine ? array($matches[3]) : array(),
);
$Block['element']['text'] []= & $Block['li']; (Note the additional changes on line 541 and 569) This way Parsedown even passes example #265 and #266. 🎉 |
is because of http://spec.commonmark.org/0.26/#example-246
Hence the lookahead.
I'm happy to combine this with your changes though, so we can solve all three! |
I'm happy to trim lines like
down to something like
Though, having 6 different Logic behind the existing formatting is that you can see the nesting depth clearly, and the bracket styling matches the rest of the code (there were no examples of |
No, that's not correct. At first I understood it the same way as you do, but after looking at the examples and testing it with the reference parser (click me), it became clear that they actually mean
with "blank line". The wording is very misleading (:confused:), but this is the only interpretation that matches the results of example #241: If they hadn't count the lone You can furthermore use CommonMark's reference parser to be 100% sure:
becomes
and not
That wouldn't be the case with the lookahead. See http://spec.commonmark.org/dingus/?text=foo%0A*%0Abar With
I actually didn't want to say that we should change this. Parsedown is @erusev's project, he decides what coding standards we should follow with our PRs and if he wants it this way, we should do it this way. Personally I prefer http://www.php-fig.org/, however, a discussion about personal preferences probably won't take us much further 😆 |
Ah I see, I was of the impression (from the wording) that because the following is a valid list (non-empty) item Looks like the wording is confusing :/
Oh, I thought I was the one who started formatting |
I guess the lookahead isn't needed in this case, if CommonMark only cares that the existing line is an empty item |
See erusev#439 (comment)
This PR actually bases on 8965c78, see PhrozenByte/parsedown@bdf537e...aidantwoods/patch-4 for the "real" diff.