Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments can end up displaying inside description meta #6108

Open
5 of 7 tasks
jmynes opened this issue Dec 15, 2021 · 12 comments
Open
5 of 7 tasks

Comments can end up displaying inside description meta #6108

jmynes opened this issue Dec 15, 2021 · 12 comments
Labels
bug An error in the Docusaurus core causing instability or issues with its execution domain: markdown Related to Markdown parsing or syntax

Comments

@jmynes
Copy link

jmynes commented Dec 15, 2021

Have you read the Contributing Guidelines on issues?

  • I have read the Contributing Guidelines on issues.

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

Issue: My .md or .mdx files are loading comments into the content of the meta property og:description
<meta property="og:description" content="<!-- This comment WILL NOT be ignored:" data-react-helmet="true">

Steps to reproduce

Example code structure to replicate this issue:

---
slug: /link_here
title: Title text here
---

<!--
  This comment will be ignored:
-->
<head>
  <meta name="description" content="This description will work correctly" />
</head>

<!-- This comment WILL NOT be ignored:
 - Point 1
 - Point 2
 -  etc -->

Page text here, lorem ipsum etc etc etc

## h2, etc etc etc

Expected behavior

og:description should not be pulling comments in to fill its content

Actual behavior

og:description seems to pull the first text after even if it's a comment?

No log messages output.

Your environment

  • Docusaurus version used: 2.0.0-beta.13, & 2.0.0-beta.9
  • Environment name and version (e.g. Chrome 89, Node.js 16.4): Edge Version 96.0.1054.57 (Official build) (64-bit)
  • Operating system and version (e.g. Ubuntu 20.04.2 LTS): Windows 10 Pro - 21H2

I have already fixed this issue in prod on our site's affected page, https://docs.domination.finance/liquidity_providers, by manually overriding the meta tag for og:description

Reproducible demo

No response

Self-service

  • I'd be willing to fix this bug myself.
@jmynes jmynes added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Dec 15, 2021
@Josh-Cena Josh-Cena removed the status: needs triage This issue has not been triaged by maintainers label Dec 16, 2021
@Josh-Cena
Copy link
Collaborator

Can reproduce and confirmed it's not an MDX problem 👍 Will look into this

@Josh-Cena
Copy link
Collaborator

Josh-Cena commented Dec 16, 2021

Haha, @jmynes I know what's wrong.

  1. The default description generator just takes the first line from the text and tries to sanitize it into plain text. However, multiline comments that only contain the opening directive on this line aren't ignored (bug here!).
  2. When you manually specify the description meta, that property gets overridden successfully. However, it's not synced with og:description (for good reason!).
  3. Still, I'm unsure why you are getting content="<!-- This comment WILL NOT be ignored:". I'm getting content="<!--" because that's literally the first line in the text. Could you confirm that's the case and you just made a mistake in making the demo?

@Josh-Cena
Copy link
Collaborator

@slorber I'm quite tired of patching the Markdown parsing logic here and there. Could we just make a Remark plugin to do this? The same goes for #5659.

@Josh-Cena Josh-Cena changed the title Comments can end up displaying inside <meta>og:description Comments can end up displaying inside description meta Dec 16, 2021
@slorber
Copy link
Collaborator

slorber commented Dec 16, 2021

Hey,

I'm sorry but I don't really understand this issue, it seems a bit messy to me.

Please be aware that the meta description that we infer from your content is only a best-effort, and it's preferable to use an explicit description.

You can do that in 2 ways:

  • provide a description: my description frontmatter (sync to both description + og:description
  • provide your own metas to override default ones, but you'll have to declare all of them, not just description:
<head>
  <meta name="description" content="This description will work correctly" />
  <meta property="og:description" content="This description will work correctly" />
</head>

@Josh-Cena this isn't really worth it to invest fixing the default, we should mostly recommend to always use a description frontmatter for better SEO control. We try to provide good defaults but ultimately it will never be perfect.

@Josh-Cena
Copy link
Collaborator

No, the problem is:

  • We have a very complicated Markdown parsing logic for the file path -> URL linkification + generating description already.
  • This is ultimately very ad-hoc and never addresses all edge cases because we just use simple regex tests.
  • We already have run into some troubles that led to us essentially needing to re-implement remark (refactor(utils): use Remark to parse Markdown server-side #5670).

My proposal is that we get rid of the makeshift markdownParser entirely and build everything as part of the MDX loading process. e.g. we can simply export the plain description string as another MDX export and render it client-side.

@Josh-Cena
Copy link
Collaborator

Patching the description generation is easy: just try to ignore multi-line comments like we do for fenced code blocks, like a state machine. The problem is, where do we end?

@Josh-Cena
Copy link
Collaborator

But yes, @jmynes I failed to mention that you can set a front matter instead of using the <head> metadata tag for the description, which will be applied for both description and og:description. However, the issue that this bug report surfaces is legitimate.

@slorber
Copy link
Collaborator

slorber commented Dec 16, 2021

My proposal is that we get rid of the makeshift markdownParser entirely and build everything as part of the MDX loading process. e.g. we can simply export the plain description string as another MDX export and render it client-side.

That wouldn't really allow being more compatible with CommonMark. export const description wouldn't be usable by users that opt out of MDX on purpose.


In the end, using content as SEO description can't work in all situations. Even if we have the perfect code to skip comments, ultimately the first sentence may not be a good text for this use-case. We could as well totally disable this fallback and always ask users for an explicit description.

Not handling comments correctly can be considered as a bug, but it's a low priority one considering using description inference is not the best idea in the first place and we should encourage to use an explicit description. Still, if you find a way to improve the inference, why not, but for me the workarounds are good enough

@Josh-Cena
Copy link
Collaborator

That wouldn't really allow being more compatible with CommonMark. export const description wouldn't be usable by users that opt out of MDX on purpose.

It's not about CM compatibility. It's about ridding the pain to support a custom Markdown parser that falls apart on many edge cases.

I agree with two things:

  • The default impl doesn't have to be perfect, maybe not even have to make sense;
  • We should encourage users to provide SEO information explicitly.

However, we can easily refactor this into a remark plugin. I would probably raise a POC later

@slorber
Copy link
Collaborator

slorber commented Dec 16, 2021

However, we can easily refactor this into a remark plugin. I would probably raise a POC later

Agree that reading things from content with Regexp is not the best. A remark plugin can be nice to extract that description (and blog excerpts probably too).

In which case it's probably a nice idea to ensure that we don't parse the whole doc as AST and the visitor can bail-out asap, for perf reasons. I don't know if this remark plugin could run as part of mdx-loader (it means description wouldn't be part anymore of doc metadatas)

@Josh-Cena
Copy link
Collaborator

Mmm, we can certainly use Remark outside of MDX. But we'll see if we run into any trouble with the lack of description as metadata

@Josh-Cena Josh-Cena linked a pull request Dec 22, 2021 that will close this issue
@Josh-Cena
Copy link
Collaborator

@slorber I've spun up a PoC in #5670. This would solve both the linkification and excerpt problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution domain: markdown Related to Markdown parsing or syntax
Projects
None yet
3 participants