-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CommonMark mixed-marker list compliance #439
Conversation
These changes aren't fit for merge, nor do they work correctly (yet)
Parsedown.php
Outdated
$Block['indent'] === $Line['indent'] | ||
and $Block['data']['type'] === 'ul' | ||
and preg_match('/^'.preg_quote($Block['data']['matchText']).'(?:[ ]+(.*)|$)/', $Line['text'], $matches) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be summarised as the following:
-
If either of the following renders true, then add the item as normal
- If the list is the same indent as the current block
- and the type is ordered
- check to see if the line matches the expected pattern (Parsedown doesn't yet support multiple ordered list markers, so comparing the pattern is the only thing we can do here)
or
- If the list is the same indent as the current block
- and the type is un-ordered
- check to see if the line has the same list marker (in accordance with CommonMark spec)
-
otherwise return null if the following is true
- the line has the same indent as the current block
You can find the issue on line 588: This line is responsible for trimming the list item (i.e. removing the marker or matching indents from the beginning of the line) and re-issuing it to * unordered, item 1 (new list 1)
* sub-unordered item 1 (new list 1.1)
* sub-unordered item 2
+ sub-unordered item 1 (new list 1.2)
- sub-unordered item 1 (new list 1.3)
1. sub-ordered item 1 (new list 1.4)
2. sub-ordered item 2
* sub-unordered item 4 (new list 1.5)
1. sub-sub ordered item 1 (new list 1.5.1)
* unordered, item 2 Have a look at the CommonMark specs about how to parse list items: http://spec.commonmark.org/0.26/#list-items |
I see, I wonder how appropriate just swapping out 4 for the current indent (plus one) is in this scenario. |
@PhrozenByte okay cool, so I'd still like to add support for the As I said, I'll take a better look through the code to try assess that indentation stripping change. Anything that you can spot immediately that might cause trouble? (I'm still yet to clean up the code to remove the data I've passed on for debugging). |
At the moment not, no. As said, you can use |
@PhrozenByte yup, I'll run it through those tests once I've added support for the alternate ordered list marker :) |
Also added marker check to ordered list case when deciding to continue the current list
@PhrozenByte I've run it through the common mark test, and we get 207 failures on both the current master, and this commit.
Thought that was a bit strange, so I tried running an old master through the test (prior to my adding the list start attribute in #431) Looking at the test log:
Apparently Parsedown is able to produce the the start attribute prior to getting the functionality to support it. It looks like the CommonMark test is broken, and is in-fact giving Parsedown some of the expected HTML (which would be the only way Parsedown could echo back that start attribute in an old version), and causing quite a few failures, where there perhaps shouldn't be? |
Ah I see – apologies, completely read over that! |
@PhrozenByte Yeah these are looking a little more sane now ;p Base line (at master) and this commit are identical in tests at 328 failures
I'm just skimming over the list tests, and I can't actually see a test for mixed markers. So I guess no improvement should be expected in that case? |
I've added some items to the test that do a little more to verify mixed markers indeed do start a new list |
Yeah, as said in #437 (comment), there's afaik no example for that at the moment (one of the reasons why passing all CommonMark tests doesn't necessarily mean that the parser is fully CommonMark compliant). However, you can (and should) add a new (!) test to Parsedown's own test suite (8965c78). My intention by referencing btw: I recommend using |
@PhrozenByte I've got this to a state where (except data set
I'll try get Here's the git diff of the master vs. this commit in CommonMarkWeak (take care reading the double git diff notation ( (Also note I've only included the additional tests that my commit has caused to fail here – it did actually improve results in some places too! ;-) ) |
Just to talk a little bit about the reasoning behind the change I made in the latest commit. The root issue that this PR solves, is the ability to create a new type of list, within a list, without nesting within an li. This test enforces that a list item take precedence over code when there is ambiguity – which is fine a change I already solved when we're talking about a list item of the same depth. The code requires a little more editing for nested lists though, we want to prefer to start a list over code, but only if the line is a list. If it isn't we want the code priority to sit before everything else to avoid breaking cases like: In the above, a Rule must super-seed a List, and Code must super-seed both a List and a Rule. However, not if we're already in a list. In that case List must super-seed Code. Okay, but again: we're okay, and can keep all the priorities in place, except if the current line is a list, and the current block is a list. However, passing the tests get a bit more complicated when encounting tests like this one, and also simultaneously having to prioritise a nested list that is within a list: This behaviour is the complete opposite to that of the previous. With the exception that it can only occur at the "root" block. Parsedown didn't currently store information about the 'parent' block when invoking the Parsedown can now differentiate between the two cases and assign priorities accordingly. (Let me know if you can see a nicer way of doing it!) |
Okay, so this should conclude the fixes for any tests that were previously broken by the added functionality. Here's the git diff of master vs. this commit for the CommonMarkWeak test, showing the failures my commit removes (i.e. improvements). This commit no longer breaks any previously passed CommonMarkWeak tests 😃 (cc: @PhrozenByte @erusev ) |
Here might be a good point to note that the code I've added to prevent breakages is far from ideal. There's nothing wrong with the actual code, per say – more that the method in which it has been added could be better (but would require code changes in all the handler functions' input structure to facilitate). Simply put: because the CommonMark spec requires awareness of parent blocks and their properties, (or lack of any parent blocks) in some places (as discussed above), Parsedown should also retain this information so that the handler functions may use information from previous blocks to decide what to do with the current one. In these changes I've only caused awareness of the parent block type in a few cases, but there are cases in which knowing the parent's indent would be useful too. And certainly there may be other properties that a handler may require about the parent when trying to be CommonMark compliant that I haven't run into here. |
…k tests" This reverts commit 6973302.
This reverts commit 0a43799.
This reverts commit 2db3199.
See CommonMark spec examples [erusev#226](http://spec.commonmark.org/0.26/#example-226) to erusev#229
According to the CommonMark specs ([list items](http://spec.commonmark.org/0.26/#list-items), rule 3), list items starting with a blank line basically behave like as if the \n doesn't exist. Also see example [erusev#241](http://spec.commonmark.org/0.26/#example-241).
This basically represents [list item parsing](http://spec.commonmark.org/0.26/#list-items), rule 1 of the CommonMark specs.
Subsequent list items which aren't indented sufficiently are treated as part of the original list, see CommonMark spec example [erusev#256](http://spec.commonmark.org/0.26/#example-256).
To be honest, I'd a bit trouble to track this down... Code comments would have been helpful 😆 The parsing strategy of Parsedown is to let the list handler take responsibility of all lines which are supposed to be part of a list item, collecting them as
However, you're still making (wrong) assumptions about the indent with preg_replace('/^[ ]{0,'.min(4, $Block['indent'] + 1).'}/', '', $Line['body']) The CommonMark specs are very clear about this: Don't assume anything, the indent must match exactly the list item's width. Any additional whitespace is treated the same way as when it is at the beginning of any other line. A missing whitespace leads to not recognizing the line as part of the list item. The only exception is a multi-line paragraph. I highly recommend you to work on the basis of the CommonMark specs. Don't just look at a failing example and try fix it, read through the specs to find out why the result should be like this and then try to implement it that way, completely independent of the failed tests you were looking at. So, instead of your massive changes in 2db3199, 0a43799 and 6973302, you just need some minor tweaks to I've opened a PR on your fork (see aidantwoods#2; here's the diff excluding the above three commits) to fix this. Simply merge my PR and the commits will automatically show up in this PR. Since I was working on it anyway, I've also fixed various other small issues. The only "real" list-related CommonMark example that still fails is issue #427, everything else is actually related to something else (like code block or blockquote parsing) or are crazy exception rules which IMHO don't make much sense and appear as if they have been added to match their reference implementation. |
Use the list marker width to determine whether a list item is continued
These changes aren't fit for merge, nor do they work correctly (yet)
Refer to #437 for context on the purpose of these changes.
The current changes echo out the following:
Given the following data:
@PhrozenByte
I may be wrong in assuming that a
return null;
statement is the correct thing to do in order to start a new list block at the current nest depth. But I'm getting some strange behaviour, where lists of different depth end up having identical indentation (according to$Block['indent'] === $Line['indent']
)Also don't worry too much about the ugly looking number of variables under
Block['data']
– they're there for debugging really – so I can see more info about the 'parent' list when theelseif
is triggered. Most of that data shouldn't be too necessary, and can certainly be cleaned up when moving towards an actual solution here.