Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async marked #2474

Merged
merged 7 commits into from
Aug 30, 2022
Merged

Async marked #2474

merged 7 commits into from
Aug 30, 2022

Conversation

UziTech
Copy link
Member

@UziTech UziTech commented May 22, 2022

Description

The walkTokens function returns the values of the function so someone can await any Promises.

The way I see this working is anyone who wants to do anything async will do it by awaiting marked.walkTokens(tokens, callback)

Something like:

const tokens = marked.lexer(markdown)
await Promise.all(marked.walkTokens(tokens, asynFunction))
const html = marked.parser(tokens)

To make the walkTokens option be able to be async we have to introduce an async option so users can instead write:

const walkTokens = asyncFunction
marked.use({walkTokens, async: true})
const html = await marked.parse(markdown)

or in an extension:

const importUrl = {
  extensions: [{
    name: 'importUrl',
    level: 'block',
    start(src) { return src.indexOf('\n:'); },
    tokenizer(src) {
      const rule = /^:(https?:\/\/.+?):/;
      const match = rule.exec(src);
      if (match) {
        return {
          type: 'importUrl',
          raw: match[0],
          url: match[1],
          html: '' // will be replaced in walkTokens
        };
      }
    },
    renderer(token) {
      return token.html;
    }
  }],
  async: true, // needed to tell marked to return a promise
  async walkTokens(token) {
    if (token.type === 'importUrl') {
      const res = await fetch(token.url);
      token.html = await res.text();
    }
  }
};

marked.use(importUrl);
const markdown = `
# example.com

:[https://example.com:](https://example.com/)
`;

const html = await marked.parse(markdown);

Contributor

  • Test(s) exist to ensure functionality and minimize regression

Committer

In most cases, this should be a different person than the contributor.

@vercel
Copy link

vercel bot commented May 22, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
marked-website ✅ Ready (Inspect) Visit Preview Aug 25, 2022 at 2:01AM (UTC)

@UziTech UziTech changed the title move async to folder Async marked May 22, 2022
@UziTech
Copy link
Member Author

UziTech commented May 22, 2022

@calculuschild @styfle @joshbruce @davisjam what do you think about this approach? Is it too much to maintain?

We could probably move more of the duplicate code to helpers.

@joshbruce
Copy link
Member

@jimmywarting - Saw you had a PR about async and curious if you have thoughts.

@UziTech - Is the extra maintenance from this approach because there's the non-async and then the async here?

Is this a fairly standard approach for the JS community? (Making sure the Marked community can easily understand and update.)

@jimmywarting
Copy link
Contributor

jimmywarting commented May 22, 2022

Saw you had a PR about async and curious if you have thoughts.

it's a pity that we need to have two version of marked... one being async and the other sync. either way, i think i might prefer to only use the async b/c i'm not that concerned about the performances b/c i would only use it on the client side and process one small chunk of markdown that would resolve pretty quickly, but i could understand if this would have to run on a server and be pretty fast to process lots of markdown data at once

The way i did it in my PR was in my opinion more stream friendly with iterators that yields chunks bits at a time.
so you could pretty much write something like this to get out data faster so you can start see a result quicker

for (let chunk of iterable) {
  if (typeof chunk !== string) chunk = await chunk
  res.write(chunk)
}
res.end()

iterable have the posibility to be turned into a stream as well using node:stream.Readable.from(iterable) and possible also for whatwg streams in the feature if it lands ReadableStream.from(iterable)

with my PR we could keep much of the logic the same.
We could also add Symbol.iterator to easily get a hold of a iterator the yields chunks.
And if the user which to use async stuff then they could just use for await so that it would call Symbol.asyncIterator instead of Symbol.iterator and promises would be resolved instead.
so that you can write this instead

for await (let chunk of iterable) res.write(chunk)
res.end()

@UziTech
Copy link
Member Author

UziTech commented May 22, 2022

@joshbruce ya the reason for the extra maintenance is because we have to maintain files that are almost the same just not quite. For instance anytime we fix a bug in the tokenizer we will have to fix it in two files.

The async code is a little under half the speed of the sync version (bench sync 5000ms, bench async 11000ms)

@jimmywarting the iterable approach doesn't work for renderers that return false to fall back to original renderers. If that feature didn't exist the async code could be much easier.

@styfle
Copy link
Member

styfle commented May 23, 2022

we have to maintain files that are almost the same just not quite. For instance anytime we fix a bug in the tokenizer we will have to fix it in two files.

That sounds like it could be prone to error, especially for someone who is a new contributor to the project.

We could probably move more of the duplicate code to helpers.

That sounds desirable since it would help reduce the chance of needing to modify two files to fix a single bug.

Fixes #458

I'm still a little skeptical that this change is needed at all. It seems like the "fetch image from CMS" use case can be solved with the current lexer and parser. Something like this:

const tokens = marked.lexer(md);

for (const token of tokens) {
  if (token.type === 'image') {
    // mutate token
  }
}

const html = marked.parser(tokens);

@UziTech
Copy link
Member Author

UziTech commented May 23, 2022

I'm still a little skeptical that this change is needed at all. It seems like the "fetch image from CMS" use case can be solved with the current lexer and parser.

I hadn't thought of that approach. That actually seems better. Perhaps all that we have to make async is the walkTokens function.

@UziTech
Copy link
Member Author

UziTech commented May 28, 2022

after playing around with this a little bit it looks like we could just return the values from walkTokens and await those to do just about anything async.

The async code would look something like:

const tokens = marked.lexer(md);

await Promise.all(marked.walkTokens(tokens, (token) => {
  if (token.type === 'image') {
    return fetch(...)
      .then(res => res.json())
      .then(data => {
        token.href = ...
      });
  }
}));

const html = marked.parser(tokens);

We could do this inside marked to just allow an option. This will only slow down marked if they specify async: true.

const html = await marked(md, {
  async: true,
  walkTokens() {
    if (token.type === 'image') {
      return fetch(...)
        .then(res => res.json())
        .then(data => {
          token.href = ...
        });
    }
  },
})

@@ -316,8 +316,9 @@ export class Lexer {
return tokens;
}

inline(src, tokens) {

This comment was marked as spam.

This comment was marked as spam.

@vixalien
Copy link

vixalien commented Jul 4, 2022

updates?

@UziTech
Copy link
Member Author

UziTech commented Jul 4, 2022

@vixalien Thanks for the push! 😉👍I had this almost done for a while.

I changed this to use the suggestion by @styfle in #2474 (comment). I made the walkTokens function return the values of the function so someone can await any Promises.

The way I see this working is anyone who wants to do anything async will do it by awaiting marked.walkTokens(tokens, callback)

Something like:

const tokens = marked.lexer(markdown)
await Promise.all(marked.walkTokens(tokens, asynFunction))
const html = marked.parser(tokens)

To make the walkTokens option be able to be async we have to introduce an async option so users can instead write:

const walkTokens = asyncFunction
marked.use({walkTokens, async: true})
const html = await marked.parse(markdown)

or in an extension:

const importUrl = {
  extensions: [{
    name: 'importUrl',
    level: 'block',
    start(src) { return src.indexOf('\n:'); },
    tokenizer(src) {
      const rule = /^:(https?:\/\/.+?):/;
      const match = rule.exec(src);
      if (match) {
        return {
          type: 'importUrl',
          raw: match[0],
          url: match[1],
          html: '' // will be replaced in walkTokens
        };
      }
    },
    renderer(token) {
      return token.html;
    }
  }],
  async: true, // needed to tell marked to return a promise
  async walkTokens(token) {
    if (token.type === 'importUrl') {
      const res = await fetch(token.url);
      token.html = await res.text();
    }
  }
};

marked.use(importUrl);
const markdown = `
# example.com

:https://example.com:
`;

const html = await marked.parse(markdown);

TODO

  • Update docs
  • Add tests

@bebraw
Copy link

bebraw commented Aug 14, 2022

@UziTech I tried out your promising work and found a potential issue. It seems it breaks link (![]()) handling. I have steps to reproduce below:

  1. Check out https://github.com/gustwindjs/gustwind/tree/bug/async-links
  2. Run the project with deno task start (needs a recent version of Deno)
  3. Head to http://localhost:3000/breezewind/ - Note the links

I have the code at https://github.com/gustwindjs/gustwind/blob/bug/async-links/site/transforms/markdown.ts . The interesting point is that when you remove that marked.use section with the async code, the links get rendered properly again so it probably has to do with the code that was added.

@bebraw
Copy link

bebraw commented Aug 14, 2022

@UziTech I managed to solve it. I had the following piece of code as per your example:

      start(src: string) {
        return src.indexOf(":");
      },

After removing, it fixed the links. Probably what happened is that it was trying to apply the extension on links since they often have :.

My apologies for the noise. 😄

@UziTech UziTech marked this pull request as ready for review August 15, 2022 01:01
@bebraw
Copy link

bebraw commented Aug 15, 2022

@UziTech Is the start check actually needed? It is there just for optimization purposes? Maybe it would be nice to have short comments at the doc portion related to this (i.e., why something exists). I might also extract importUrl as a const to group it better (nicer for maintenance as it's less brittle when changing things around).

In any case, I think it's a great addition to the API as it allows a lot of imaginative uses. It could be even worth mentioning some use cases at the docs to inspire people.

@UziTech
Copy link
Member Author

UziTech commented Aug 15, 2022

@bebraw the start is needed to make sure the paragraph and text tokenizers doesn't consume too much.

hi
:https//example.com:

For example, without the start function the above markdown would be seen as one paragraph since there is no blank line between the lines and it doesn't know to stop at : to check the custom extension.

@UziTech
Copy link
Member Author

UziTech commented Aug 21, 2022

@styfle @joshbruce @calculuschild @davisjam Any objections to this?

I think it will help allow a whole new type of async extensions for marked without any breaking changes. The only down side is an added option (async) to prevent anyone not using async from slowing down.

see the added documentation at https://marked-website-git-fork-uzitech-async-marked-markedjs.vercel.app/using_pro#async

docs/USING_ADVANCED.md Outdated Show resolved Hide resolved
docs/USING_PRO.md Outdated Show resolved Hide resolved
test/unit/marked-spec.js Outdated Show resolved Hide resolved
Copy link
Contributor

@calculuschild calculuschild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine to me. I don't really have any feedback other than what @styfle has already mentioned.

Co-authored-by: Steven <steven@ceriously.com>
Co-authored-by: Steven <steven@ceriously.com>
@UziTech
Copy link
Member Author

UziTech commented Aug 25, 2022

What do you think about adding a parseAsync function to return the promise?

I know Jasmine ran into an issue when adding async expect where people would forget to await the expect call so they changed it to expectAsync for async expect calls so people wouldn't forget to add await before.

We could do something similar. If the async option is set (either by the user or an extension) than any calls to parse would fail with a message like Must use parseAsync for async marked so people don't forget the await.

@styfle
Copy link
Member

styfle commented Aug 28, 2022

Sounds good. Also the PR description no longer matches the content so we should probably update to avoid confusion to readers in the future.

@UziTech
Copy link
Member Author

UziTech commented Aug 28, 2022

After working on the parseAsync way it looks like that will require some other small tradeoffs so we can change it in the future if it is needed.

For now I think this is good to go once you approve @styfle. 😁👍

Copy link
Member

@styfle styfle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@UziTech UziTech merged commit 994b2e6 into markedjs:master Aug 30, 2022
github-actions bot pushed a commit that referenced this pull request Aug 30, 2022
# [4.1.0](v4.0.19...v4.1.0) (2022-08-30)

### Features

* add async option ([#2474](#2474)) ([994b2e6](994b2e6))
@github-actions
Copy link

🎉 This PR is included in version 4.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@vixalien
Copy link

yay!

@metaskills
Copy link

Thanks for the work on this. I was thinking of using Marked in my Express backend which streaming OpenAI responses. Those responses can be markdown but I'm not sure if this feature would support my use case. My goal is to simulate what OpenAI ChatGPT is doing in their client. I think I have two options as each OpenAI stream arrives.

  1. Use Marked parse on a continually updated string representing the current total of the completion. Change Vue client side code to replace the DOM message's inner HTML with the updated Markdown -> HTML.
  2. Use Marked in some async way that with each part of the stream from OpenAI. Honestly I would have no idea how I would handle these updates on the Vue side because parts will not be guaranteed to represent fully parsed HTML nodes.

So as I say this... I'm thinking #1 is my best option which means I do not need Async Marked, but thought I would put that question out for folks who may have wandered here like I did. Again, thanks for the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

async renderer support
9 participants