Remark mis-parses nested code blocks in list items #315

infotexture · 2017-12-13T13:19:15Z

When code blocks are nested within list items with CommonMark's 7-space (3+4=7) indentation rule, remark fails to recognize the code blocks and treats them as nested paragraphs in the list item rather than <code>.

Steps to reproduce

This CommonMark playground example shows how CommonMark handles nested code blocks within list items with various indentations.

TL;DR: CommonMark recognizes 7-space indents as code blocks in single-digit ordered list items.

Expected behaviour

The parsed remark AST should be same as the result in the CommonMark playground.

Actual behaviour

The remark parser fails to recognize CommonMark's (3+4=7) indentation for code blocks within list items and treats them as nested paragraphs in the list item rather than <code>.

This Prettier playground example shows how remark stumbles on the 7-space indentation.

(See prettier/prettier#3459 for the initial discussion and additional examples.)

The text was updated successfully, but these errors were encountered:

wooorm · 2018-01-17T15:35:39Z

Thanks for the issue and sorry for the late reply!

That’s definitely a bug. The code for lists is pretty big and bug prone. It could use a rewrite!

wooorm · 2019-10-30T17:42:50Z

This is unresolved but we are working on an alternative: #439

MarquiseRosier · 2019-10-30T17:46:03Z

Sweet; exciting stuff; I'll keep my eyes peeled! :D

wooorm · 2020-08-22T15:01:45Z

Heya, just wanted to give an update about micromark, it’s sort-of a new motor that we’ll soon use in remark to parse markdown. It’s not yet 100% ready but will be relatively soon. The good news is, it fixes this issue! (P.S. see this twitter thread for some more info!)

MarquiseRosier · 2020-08-23T21:31:27Z

Heya, just wanted to give an update about micromark, it’s sort-of a new motor that we’ll soon use in remark to parse markdown. It’s not yet 100% ready but will be relatively soon. The good news is, it fixes this issue! (P.S. see this twitter thread for some more info!)

That's wonderful news! Thanks a ton and we're super excited. :)

This is a giant change for remark. It replaces the 5+ year old internals with a new low-level parser: <https://github.com/micromark/micromark> The old internals have served billions of users well over the years, but markdown has changed over that time. micromark comes with 100% CommonMark (and GFM as an extension) compliance, and (WIP) docs on parsing rules for how to tokenize markdown with a state machine: <https://github.com/micromark/common-markup-state-machine>. micromark, and micromark in remark, is a good base for the future. `remark-parse` now defers its work to [`micromark`][micromark] and [`mdast-util-from-markdown`][from-markdown]. `micromark` is a new, small, complete, and CommonMark compliant low-level markdown parser. `from-markdown` turns its tokens into the previously (and still) used syntax tree: [mdast][]. Extensions to `remark-parse` work differently: they’re a two-part act. See for example [`micromark-extension-footnote`][micromark-footnote] and [`mdast-util-footnote`][from-markdown-footnote]. * change: `commonmark` is no longer an option — it’s the default * move: `gfm` is no longer an option — moved to `remark-gfm` * remove: `pedantic` is no longer an option — this legacy and buggy flavor of markdown is no longer widely used * remove: `blocks` is no longer an options — it’s no longer suggested to change the internal list of HTML “block” tag names remark-stringify now defers its work to [`mdast-util-to-markdown`][to-markdown]. It’s a new and better serializer with powerful features to ensure serialized markdown represents the syntax tree (mdast), no matter what plugins do. Extensions to it work differently: see for example [`mdast-util-footnote`][to-markdown-footnote]. * change: `commonmark` is no longer an option, it’s the default * change: `emphasis` now defaults to `*` * change: `bullet` now defaults to `*` * move: `gfm` is no longer an option — moved to `remark-gfm` * move: `tableCellPadding` — moved to `remark-gfm` * move: `tablePipeAlign` — moved to `remark-gfm` * move: `stringLength` — moved to `remark-gfm` * remove: `pedantic` is no longer an option — this legacy and buggy flavor of markdown is no longer widely used * remove: `entities` is no longer an option — with CommonMark there is almost never a need to use character references, as character escapes are preferred * new: `quote` — you can now prefer single quotes (`'`) over double quotes (`"`) in titles All of these are for CommonMark compatibility. Most of them are inconsequential. * **notable**: references (as in, links `[text][id]` and images `![alt][id]`) are no longer present as such in the syntax tree if they don’t have a corresponding definition (`[id]: example.com`). The reason for this is that CommonMark requires `[text *emphasis start][undefined] emphasis end*` to be emphasis. * **notable**: it is no longer possible to use two blank lines between two lists or a list and indented code. CommonMark prohibits it. For a solution, use an empty comment to end lists (``) * inconsequential: whitespace at the start and end of lines in paragraphs is now ignored * inconsequential: `<mailto:foobarbaz>` are now correctly parsed, and the scheme is part of the tree * inconsequential: indented code can now follow a block quote w/o blank line * inconsequential: trailing indented blank lines after indented code are no longer part of that code * inconsequential: character references and escapes are no longer present as separate text nodes * inconsequential: character references which HTML allows but CommonMark doesn’t, such as `&copy` w/o the semicolon, are no longer recognized * inconsequential: the `indent` field is no longer available on `position` * fix: multiline setext headings * fix: lazy lists * fix: attention (emphasis, strong) * fix: tabs * fix: empty alt on images is now present as an empty string * …plus a ton of other minor previous differences from CommonMark * get folks to use this and report problems! * make `remark-gfm` * start making next branches for plugins * get types into {from,to}-markdown and use them here Closes GH-218. Closes GH-306. Closes GH-315. Closes GH-324. Closes GH-398. Closes GH-402. Closes GH-407. Closes GH-439. Closes GH-450. Closes GH-459. Closes GH-493. Closes GH-494. Closes GH-497. Closes GH-504. Closes GH-517. Closes GH-521. Closes GH-523. Closes remarkjs/remark-lint#111. [micromark]: https://github.com/micromark/micromark [from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown [to-markdown]: https://github.com/syntax-tree/mdast-util-to-markdown [micromark-footnote]: https://github.com/micromark/micromark-extension-footnote/blob/main/index.js [to-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/to-markdown.js [from-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/from-markdown.js [mdast]: https://github.com/syntax-tree/mdast

wooorm · 2020-10-01T15:17:50Z

Sorry for the wait! I just wanted to share that there’s now a PR that solves this issue: #536.

MarquiseRosier · 2020-10-01T18:08:57Z

Sorry for the wait! I just wanted to share that there’s now a PR that solves this issue: #536.

Woot Woot Woot Woot! Wow that's really impressive; thanks a ton I'll share the news with the team!!!!

infotexture · 2020-10-04T15:49:45Z

Nice to see a solution for this on the horizon. Thanks to @wooorm for following up. 🙏

This is a giant change for remark. It replaces the 5+ year old internals with a new low-level parser: <https://github.com/micromark/micromark> The old internals have served billions of users well over the years, but markdown has changed over that time. micromark comes with 100% CommonMark (and GFM as an extension) compliance, and (WIP) docs on parsing rules for how to tokenize markdown with a state machine: <https://github.com/micromark/common-markup-state-machine>. micromark, and micromark in remark, is a good base for the future. `remark-parse` now defers its work to [`micromark`][micromark] and [`mdast-util-from-markdown`][from-markdown]. `micromark` is a new, small, complete, and CommonMark compliant low-level markdown parser. `from-markdown` turns its tokens into the previously (and still) used syntax tree: [mdast][]. Extensions to `remark-parse` work differently: they’re a two-part act. See for example [`micromark-extension-footnote`][micromark-footnote] and [`mdast-util-footnote`][from-markdown-footnote]. * change: `commonmark` is no longer an option — it’s the default * move: `gfm` is no longer an option — moved to `remark-gfm` * remove: `pedantic` is no longer an option — this legacy and buggy flavor of markdown is no longer widely used * remove: `blocks` is no longer an options — it’s no longer suggested to change the internal list of HTML “block” tag names remark-stringify now defers its work to [`mdast-util-to-markdown`][to-markdown]. It’s a new and better serializer with powerful features to ensure serialized markdown represents the syntax tree (mdast), no matter what plugins do. Extensions to it work differently: see for example [`mdast-util-footnote`][to-markdown-footnote]. * change: `commonmark` is no longer an option, it’s the default * change: `emphasis` now defaults to `*` * change: `bullet` now defaults to `*` * move: `gfm` is no longer an option — moved to `remark-gfm` * move: `tableCellPadding` — moved to `remark-gfm` * move: `tablePipeAlign` — moved to `remark-gfm` * move: `stringLength` — moved to `remark-gfm` * remove: `pedantic` is no longer an option — this legacy and buggy flavor of markdown is no longer widely used * remove: `entities` is no longer an option — with CommonMark there is almost never a need to use character references, as character escapes are preferred * new: `quote` — you can now prefer single quotes (`'`) over double quotes (`"`) in titles All of these are for CommonMark compatibility. Most of them are inconsequential. * **notable**: references (as in, links `[text][id]` and images `![alt][id]`) are no longer present as such in the syntax tree if they don’t have a corresponding definition (`[id]: example.com`). The reason for this is that CommonMark requires `[text *emphasis start][undefined] emphasis end*` to be emphasis. * **notable**: it is no longer possible to use two blank lines between two lists or a list and indented code. CommonMark prohibits it. For a solution, use an empty comment to end lists (``) * inconsequential: whitespace at the start and end of lines in paragraphs is now ignored * inconsequential: `<mailto:foobarbaz>` are now correctly parsed, and the scheme is part of the tree * inconsequential: indented code can now follow a block quote w/o blank line * inconsequential: trailing indented blank lines after indented code are no longer part of that code * inconsequential: character references and escapes are no longer present as separate text nodes * inconsequential: character references which HTML allows but CommonMark doesn’t, such as `&copy` w/o the semicolon, are no longer recognized * inconsequential: the `indent` field is no longer available on `position` * fix: multiline setext headings * fix: lazy lists * fix: attention (emphasis, strong) * fix: tabs * fix: empty alt on images is now present as an empty string * …plus a ton of other minor previous differences from CommonMark * get folks to use this and report problems! * make `remark-gfm` * start making next branches for plugins * get types into {from,to}-markdown and use them here Closes GH-218. Closes GH-306. Closes GH-315. Closes GH-324. Closes GH-398. Closes GH-402. Closes GH-407. Closes GH-439. Closes GH-450. Closes GH-459. Closes GH-493. Closes GH-494. Closes GH-497. Closes GH-504. Closes GH-517. Closes GH-521. Closes GH-523. Closes remarkjs/remark-lint#111. [micromark]: https://github.com/micromark/micromark [from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown [to-markdown]: https://github.com/syntax-tree/mdast-util-to-markdown [micromark-footnote]: https://github.com/micromark/micromark-extension-footnote/blob/main/index.js [to-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/to-markdown.js [from-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/from-markdown.js [mdast]: https://github.com/syntax-tree/mdast

wooorm · 2020-10-14T08:56:23Z

This is now released in remark@13.0.0

wooorm added 🐛 type/bug This is a problem remark-parse major labels Jan 17, 2018

ikatyang mentioned this issue Jul 25, 2018

feat(markdown): only align lists if they're already aligned prettier/prettier#4893

Merged

wooorm added 🧑 semver/major This is a change and removed needs pr labels Aug 12, 2019

This comment has been minimized.

Sign in to view

MarquiseRosier mentioned this issue Oct 30, 2019

Wrong HTML rendering with lists adobe/helix-pipeline#458

Closed

lishid mentioned this issue Aug 19, 2020

Nested list item becomes single paragraph when indenting >=4 spaces #523

Closed

wooorm mentioned this issue Oct 1, 2020

Change to use micromark #536

Merged

wooorm closed this as completed in #536 Oct 13, 2020

wooorm added the ⛵️ status/released label Oct 14, 2020

wooorm added the 💪 phase/solved Post is done label Aug 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remark mis-parses nested code blocks in list items #315

Remark mis-parses nested code blocks in list items #315

infotexture commented Dec 13, 2017

wooorm commented Jan 17, 2018

This comment has been minimized.

wooorm commented Oct 30, 2019

MarquiseRosier commented Oct 30, 2019

wooorm commented Aug 22, 2020

MarquiseRosier commented Aug 23, 2020

wooorm commented Oct 1, 2020

MarquiseRosier commented Oct 1, 2020

infotexture commented Oct 4, 2020

wooorm commented Oct 14, 2020

Remark mis-parses nested code blocks in list items #315

Remark mis-parses nested code blocks in list items #315

Comments

infotexture commented Dec 13, 2017

Steps to reproduce

Expected behaviour

Actual behaviour

wooorm commented Jan 17, 2018

This comment has been minimized.

wooorm commented Oct 30, 2019

MarquiseRosier commented Oct 30, 2019

wooorm commented Aug 22, 2020

MarquiseRosier commented Aug 23, 2020

wooorm commented Oct 1, 2020

MarquiseRosier commented Oct 1, 2020

infotexture commented Oct 4, 2020

wooorm commented Oct 14, 2020