Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Front Matter Extraction "Wrong" #264

Open
NikoMix opened this issue Feb 7, 2023 · 1 comment
Open

Front Matter Extraction "Wrong" #264

NikoMix opened this issue Feb 7, 2023 · 1 comment
Labels
Discussion/Question Discussions or questions about the code

Comments

@NikoMix
Copy link
Contributor

NikoMix commented Feb 7, 2023

This is partly a bug, partly a discussion about how the intended behavior should be.

Issue, I have a markdown file with a "----" somewhere in the middle of the document, which however never get's terminated. It's "just" a logical separation of content not at all affiliated with YAML Syntax. However the "ExtractFrontMatter"-Module kick's in an start's doing it's thing, handing over to Statiq.Yaml which crashes at ParseYaml due to "Did not find expected < document end >.", I do understand this "issue" is within the YamlDotNet Library, which however shouldn't ever be called in the first place, as the content is no YAML (markdown).

Technically there should not be any restriction and as such no content omitted from Markdown if somewhere in the middle of the content "---" is contained. Based on Jekyll's description "[...] The front matter must be the first thing [...]" so before prosing a fix, I'd like to understand what should be the intended behavior as this would be a potential breaking change.

In the Unit Test Project however I find many tests, which are expecting Front Matter being expected in the middle of the document and thus modifying the output. Thus the confusion.

@daveaglick
Copy link
Member

This question gets really tricky and it's one of the areas I've probably spent the most time going back and forth on. Because there are no standards regarding front matter delimiting, all we're left with is the conventions that other generators take. I've generally found two patterns for generators that accept YAML front matter. They both include a trailing ---, but the preceding first-line --- appears to be a little less agreed on while some generators like Jekyll require it and others don't. I erred on the side of compatibility so Statiq supports both styles, though as you noted that means a single-line --- elsewhere will mean "everything above this is front matter," even if that's not the intent. The Statiq case is even trickier because Statiq supports any front matter format in theory, and front matter in any kind of file, so the delimiter style has to be pluggable (I.e. JSON front matter in a C# script is going to require totally different kinds of delimiting).

All this is to say you've found a known edge case when a front matter delimiter is used further down in a file. This is indeed different than Jekyll, but intentionally so (while Statiq aims for some measure of compatibility to make porting easier, it's not a "Jekyll-compatible" generator and that's not a goal of the project).

The easiest way to handle this situations if you know you're always going to be using Jekyll-style front matter delimiters that include a first-line --- is to modify the FrontMatterRegexes so that a first-list delimiter is required.

The default FrontMatterRegexes setting includes this regex: \A(?:^\r*-+[^\S\n]*$\r?\n)?(.*?)(?:^\r*-+[^\S\n]*$(\r?\n)?) which you can see matches both with and without the initial ---:

image

image

So if you want to only match when a starting --- is present, you can adjust the regex to \A(?:^\r*-+[^\S\n]*$\r?\n)(.*?)(?:^\r*-+[^\S\n]*$(\r?\n)?):

image

image

image

This can be changed like this:

await Bootstrapper.Factory
    .CreateDefault(args)
    .AddSetting(
        WebKeys.FrontMatterRegexes,
        new[] { @"\A(?:^\r*-+[^\S\n]*$\r?\n)(.*?)(?:^\r*-+[^\S\n]*$(\r?\n)?)" })
    // ...
    .RunAsync();

Let me know if that resolves the issue for you.

@daveaglick daveaglick added the Discussion/Question Discussions or questions about the code label Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion/Question Discussions or questions about the code
Development

No branches or pull requests

2 participants