Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code blocks with dashed header elements don't parse correctly #35

Open
schell opened this issue Nov 5, 2020 · 2 comments
Open

Code blocks with dashed header elements don't parse correctly #35

schell opened this issue Nov 5, 2020 · 2 comments

Comments

@schell
Copy link

schell commented Nov 5, 2020

First of all, thank you for this great crate!

It seems that if a code block contains a markdown element like a header, the code block is cut short. Here is an example - given this input markdown (with triple backticks "escaped" for this github comment):

Here is a test:

\```haskell
--------------------------------------------------------------------------------
-- Big crazy comment
--------------------------------------------------------------------------------
data MyType =
    Variant1
  | Variant2
  | Variant3
  deriving (Show)
\```

That was the test.

We would expect that the output tokens would be something like:

[
    Paragraph(
        [
            Text(
                "Here is a test:",
            ),
        ],
    ),
    CodeBlock(
        Some(
            "haskell",
        ),
        "----------------------------------------\n-- Big crazy comment\n-------------------------------------\ndata MyType =\n    Variant1\n  | Variant2\n  | Variant3\n  deriving (Show)",
    ),
    Paragraph(
        [
            Text(
                "That was the test.",
            ),
        ],
    ),
]

but instead we see:

[
    Paragraph(
        [
            Text(
                "Here is a test:",
            ),
        ],
    ),
    Header(
        [
            Code(
                "`",
            ),
            Text(
                "haskell",
            ),
        ],
        2,
    ),
    Header(
        [
            Text(
                "-- Big crazy comment",
            ),
        ],
        2,
    ),
    Paragraph(
        [
            Text(
                "data MyType =",
            ),
        ],
    ),
    CodeBlock(
        None,
        "Variant1",
    ),
    Paragraph(
        [
            Text(
                "| Variant2",
            ),
            Text(
                "\n",
            ),
            Text(
                "| Variant3",
            ),
            Text(
                "\n",
            ),
            Text(
                "deriving (Show)",
            ),
            Text(
                "\n",
            ),
            Code(
                "`",
            ),
        ],
    ),
    Paragraph(
        [
            Text(
                "That was the test.",
            ),
        ],
    ),
]
@gennyble
Copy link
Collaborator

gennyble commented Nov 6, 2020

This happened because we look first for the setext header before the code fence, so that matches first which is obviously not right. The CommonMark spec says: "The lines of text must be such that, were they not followed by the setext heading underline, they would be interpreted as a paragraph: they cannot be interpretable as a code fence, ATX heading, block quote, thematic break, list item, or HTML block."

I've moved the setext match to be the last one we check and pushed it to the master branch. I can't update on crates.io, so I'll gently ping @johannhof. I'll leave this issue open until word from them.

While you're pinged I'd like to ask if your goal with the library has been to follow CommonMark? I don't believe the current implementation of ordered lists complies with the spec. I'd be happy to get it there if you think we should. (sorry for being freakishly inactive and thank you for adding me as a maintainer, I'll start doing that now)

@hoijui
Copy link

hoijui commented Jun 4, 2021

@gennyble following CommonMark would be really great! we would need that too.
in the need, we would also need some features from GFM and pandoc's MD, but those are tiny additions we could hack on top, just for our case.
Can we help in any way?

Isn't there some parser description for CommanMark already, which just would have to be ported to the syntax of a rust parser library?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants