Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmark_resume panics when provided a crafted input #91

Closed
kdarkhan opened this issue Dec 10, 2024 · 8 comments · Fixed by #92
Closed

cmark_resume panics when provided a crafted input #91

kdarkhan opened this issue Dec 10, 2024 · 8 comments · Fixed by #92

Comments

@kdarkhan
Copy link

Here is a minimal test:

    #[test]
    fn cmark_resume_with_options_does_not_panic() {
        let events = [
            Event::Start(Tag::Heading {
                level: HeadingLevel::H2,
                id: None,
                classes: vec![],
                attrs: vec![],
            }),
            Event::Start(Tag::Heading {
                level: HeadingLevel::H2,
                id: None,
                classes: vec![],
                attrs: vec![],
            }),
            Event::Text(pulldown_cmark::CowStr::Borrowed("(")),
            Event::End(TagEnd::Heading(HeadingLevel::H2)),
            Event::End(TagEnd::Heading(HeadingLevel::H2)),
        ];
        let _ = cmark_resume(events.iter(), String::new(), None);
    }

It results in a panic:

---- count_consecutive::cmark_resume_with_options_does_not_panic stdout ----
thread 'count_consecutive::cmark_resume_with_options_does_not_panic' panicked at src/lib.rs:473:21:
assertion `left == right` failed
  left: Some(Heading { id: None, classes: [], attributes: [] })
 right: None
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The failure was found by oss fuzzer in this run.

@Byron
Copy link
Owner

Byron commented Dec 11, 2024

Thanks a lot for pointing this out!

I have removed these assertions and unwraps in favor of doing nothing instead.
The alternative would be to return an error and bail out, instead of possibly generating an invalid output stream. But then again, garbage in, garbage out seems like a fair deal.

Here is the new release: https://github.com/Byron/pulldown-cmark-to-cmark/releases/tag/v19.0.1

@mgeisler
Copy link
Contributor

Thanks @kdarkhan for reporting this! Looking at the events, it seems odd that there are two start tags after each other?

The fuzz test that crashed is the gettext test, I believe. This works by throwing random mdbook projects at our text extraction code, so the input should ultimately be Markdown.

So in other words: is there perhaps a bug in pulldown-cmark since it generated two header start events right after each other? Or is this just an artifact of how you reduced the test case to trigger the panic in this crate?

@cip999
Copy link
Contributor

cip999 commented Dec 13, 2024

Hi all!

I had a look at the fuzzer crash. A de-obscured version of the minimized test case is:

book_items = [
  PartTitle("hello\n---")
]

translations = [
  ("hello", "world\n---")
]

Note that, by itself, hello\n--- is parsed as an H2 setext heading.

While the initial input is valid Markdown, it becomes an invalid stream of Events in-translation. The problem fundamentally lies here, where the translated message string is expanded (without context) into Events which replace the original Translate group. In this case, even if both the original text and translated message parse just fine, the result has two consecutive start heading tags — IOW: inserting a valid Event list into another can corrupt it.

I would say this isn't really a bug, more that the user did something nasty. So, in a way, panicking is probably more "correct" than failing silently? I assume that, if something like this happens, it would be a benign mistake made by the translator, and they'd rather know that something is wrong than get a book with inexplicably broken formatting. Still, panicking doesn't give much of an indication of what went wrong.

I propose to revert #92 and replace panics with more structrured error handling. For example:

pub enum ResumeError {
  FormatFailed(fmt::Error),
  UnexpectedEvent,
}

and returning a Result<(), ResumeError>. This way we can handle the failure more gracefully downstream, and display a meaningful error message to the user. @Byron @mgeisler WDYT?

@mgeisler
Copy link
Contributor

Hi @cip999,

Thank you so much for the investigation! This is very useful to me and my project!

IOW: inserting a valid Event list into another can corrupt it.

Oh, yes, that makes sense!

I assume that, if something like this happens, it would be a benign mistake made by the translator, and they'd rather know that something is wrong than get a book with inexplicably broken formatting.

Yes, in this context that is a great description: the mdbook-i18n-helpers system is very much relying on translators not inserting "extra" markup into their translations.

To recap, the situation that caused this to crash cannot occur in the wild since it represents an impossible document where

hello
---

is translated into

world
---
---

and where this "double header" is parsed as two headers. In reality, this would be parsed as a header followed by a <hr>. It only becomes two headers because the translation system mangles the events in weird ways.

I propose to revert #92 and replace panics with more structrured error handling.

That sounds like a good idea to 👍

@cip999
Copy link
Contributor

cip999 commented Dec 14, 2024

Thank you so much for the investigation! This is very useful to me and my project!

Happy to help, I find this perfect to get started on contributing to Rust projects. 🙂

In reality, this would be parsed as a header followed by a <hr>. It only becomes two headers because the translation system mangles the events in weird ways.

Precisely, this couldn't happen by means of pure text substitution.

That sounds like a good idea to 👍

Cool, I'll maybe send a PR here so the maintainers can form a better idea of the proposed change.

@Byron
Copy link
Owner

Byron commented Dec 14, 2024

Thanks everyone! I am absolutely looking forward to that PR!

cip999 added a commit to cip999/pulldown-cmark-to-cmark that referenced this issue Dec 14, 2024
@mgeisler
Copy link
Contributor

Thank you so much for the investigation! This is very useful to me and my project!

Happy to help, I find this perfect to get started on contributing to Rust projects. 🙂

I expect https://github.com/google/mdbook-i18n-helpers/ will need an update after this to handle the newly returned error. @kdarkhan or I will eventually get to it, but you're very welcome to try your new Rust knowledge there as well 😄

Byron added a commit that referenced this issue Dec 15, 2024
Alternative approach for #91 with custom error type
@cip999
Copy link
Contributor

cip999 commented Dec 15, 2024

Thanks @Byron for merging!

Thank you so much for the investigation! This is very useful to me and my project!

Happy to help, I find this perfect to get started on contributing to Rust projects. 🙂

I expect https://github.com/google/mdbook-i18n-helpers/ will need an update after this to handle the newly returned error. @kdarkhan or I will eventually get to it, but you're very welcome to try your new Rust knowledge there as well 😄

I think upgrading the dependency is non-breaking since right now it's always unwrap-ping calls to cmark_resume_with_options. But then sure, we need to make some tweaks to make the changes here actually useful. 🙃

I'll get to that soon! Let's move the conversation to the other repo. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants