-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tools: update marked to 0.4.0 #21081
Conversation
@@ -1,19 +0,0 @@ | |||
Copyright (c) 2011-2014, Christopher Jeffrey (https://github.com/chjj/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to fix the license-builder script:
Lines 74 to 75 in 7dca329
addlicense "marked" "tools/doc/node_modules/marked" \ | |
"$(cat ${rootdir}/tools/doc/node_modules/marked/LICENSE)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated license-builder.sh. Thanks.
@Trott Why is the |
I'm not 100% sure, but I imagine the problem it is supposed to solve is probably solved by package-lock.json. |
@Trott I shall try to check if there are any breaking changes in our HTML/JSON docs with this update. Last time I've checked, there were some. |
Is there a build step to generate all the HTML files? |
@styfle |
To build just any doc: node tools/doc/generate.js --format=json doc/api/all.md > all.json
node tools/doc/generate.js --format=html --node-version=11.0.0 --analytics= doc/api/all.md > all.html |
@@ -0,0 +1 @@ | |||
../marked/bin/marked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this to remove node modules?
cd tools/doc && echo 'node_modules' > .gitignore && rm -rf node_modules && git add -A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have other tools
directory node_modules
subdirectories listed as ignored in the top-level .gitignore
so putting it there is probably the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we'd need to add an npm install
step for that directory in Makefile
and vcbuild.bat
similar to lint-md-build
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it going to cause a problem to run npm install
each time in the Makefile?
I'm still new to the build tools 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nodejs/build ^^^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not a big issue, but would obviously be better to avoid doing so (would cause longer build times)
I guess that might explain checking in the node_modules directory. :-D
I've started to check the diff in HTML and JSON output before and after this upgrade. There are many insignificant changes: white spaces (+/- line breaks), code blocks class names ( But here is the first detected really breaking change: Before: After: Cause: markdown links are not rendered as links inside HTML markup. Small test to check: 'use strict';
const marked = require('marked');
const md = `
<div>A [link][].</div>
[link]: #hash
`;
const tokens = marked.lexer(md);
console.log(tokens);
const html = marked.parser(tokens);
console.log(`\n${html}`); With the current version: [ { type: 'html', pre: false, text: '<div>A [link][].</div>\n\n' },
links: { link: { href: '#hash', title: undefined } } ]
<div>A <a href="#hash">link</a>.</div> With the new version: [ { type: 'html', pre: false, text: '<div>A [link][].</div>\n\n' },
links: { link: { href: '#hash', title: undefined } } ]
<div>A [link][].</div> |
This issue concerns not links only, but all Markdown markup inside HTML markup (code fragments in backticks are not rendered as code too, for example). |
Issue 2, more complicated.Before (5 links with right texts and URLs): After (3 links with wrong text and URLs + 2 code spans instead of links): Cause: some combination of links with backticks (some of them are in full form like Reduced test case with the same pattern as in the source above: 'use strict';
const marked = require('marked');
const md = `
Links: [\`link1\`] and [\`link2\`] and [\`link3\`] and
[\`link 4\`][\`link4\`] and [\`link 4\`][\`link4\`], end.
[\`link1\`]: #hash1
[\`link2\`]: #hash2
[\`link3\`]: #hash3
[\`link4\`]: #hash4
`;
const tokens = marked.lexer(md);
console.log(tokens);
const html = marked.parser(tokens);
console.log(`\n${html}`); With the current version: [ { type: 'paragraph',
text:
'Links: [`link1`] and [`link2`] and [`link3`] and\n[`link 4`][`link4`] and [`link 4`][`link4`], end.' },
links: { '`link1`': { href: '#hash1', title: undefined },
'`link2`': { href: '#hash2', title: undefined },
'`link3`': { href: '#hash3', title: undefined },
'`link4`': { href: '#hash4', title: undefined } } ]
<p>Links: <a href="#hash1"><code>link1</code></a> and <a href="#hash2"><code>link2</code></a> and <a href="#hash3"><code>link3</code></a> and
<a href="#hash4"><code>link 4</code></a> and <a href="#hash4"><code>link 4</code></a>, end.</p> With the new version: [ { type: 'paragraph',
text:
'Links: [`link1`] and [`link2`] and [`link3`] and\n[`link 4`][`link4`] and [`link 4`][`link4`], end.' },
{ type: 'space' },
links: { '`link1`': { href: '#hash1', title: undefined },
'`link2`': { href: '#hash2', title: undefined },
'`link3`': { href: '#hash3', title: undefined },
'`link4`': { href: '#hash4', title: undefined } } ]
<p>Links: <a href="#hash4"><code>link1</code>] and <a href="#hash4"><code>link2</code>] and <a href="#hash3"><code>link3</code></a> and
[<code>link 4</code></a> and [<code>link 4</code></a>, end.</p> |
Possibly related: markedjs/marked#1263 |
Issue 1.2. Most likely related to the first issue: text links inside HTML markup are not auto-linkified. Before: After: Small test: 'use strict';
const marked = require('marked');
const md = `
<table><tr><td>
See https://example.org for more info.
</td></tr></table>
`;
const tokens = marked.lexer(md);
console.log(tokens);
const html = marked.parser(tokens);
console.log(`\n${html}`); With the current version: [ { type: 'html',
pre: false,
text:
'<table><tr><td>\nSee https://example.org for more info.\n</td></tr></table>\n' },
links: {} ]
<table><tr><td>
See <a href="https://example.org">https://example.org</a> for more info.
</td></tr></table> With the new version: [ { type: 'html',
pre: false,
text:
'<table><tr><td>\nSee https://example.org for more info.\n</td></tr></table>\n' },
links: {} ]
<table><tr><td>
See https://example.org for more info.
</td></tr></table> |
Issue 3. Period is wrongly included in URL of auto-linkified text link inside parentheses. Before: After: Small test: 'use strict';
const marked = require('marked');
const md = `
(See https://github.com/nodejs/node/pull/12562.)
`;
const tokens = marked.lexer(md);
console.log(tokens);
const html = marked.parser(tokens);
console.log(`\n${html}`); With the current version: [ { type: 'paragraph',
text: '(See https://github.com/nodejs/node/pull/12562.)' },
links: {} ]
<p>(See <a href="https://github.com/nodejs/node/pull/12562">https://github.com/nodejs/node/pull/12562</a>.)</p> With the new version: [ { type: 'paragraph',
text: '(See https://github.com/nodejs/node/pull/12562.)' },
links: {} ]
<p>(See <a href="https://github.com/nodejs/node/pull/12562.">https://github.com/nodejs/node/pull/12562.</a>)</p> |
Issue 4. Markdown tables are not rendered. Before: After: Small test: 'use strict';
const marked = require('marked');
const md = `
| a | b |
|-------|
| c | d |
`;
const tokens = marked.lexer(md);
console.log(tokens);
const html = marked.parser(tokens);
console.log(`\n${html}`); With the current version: [ { type: 'space' },
{ type: 'table',
header: [ 'a', 'b' ],
align: [ null ],
cells: [ [Array] ] },
links: {} ]
<table>
<thead>
<tr>
<th>a</th>
<th>b</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>d</td>
</tr>
</tbody>
</table> With the new version: [ { type: 'space' },
{ type: 'paragraph', text: '| a | b |\n|-------|\n| c | d |' },
{ type: 'space' },
links: {} ]
<p>| a | b |
|-------|
| c | d |</p> |
Issue 5. Tricky one. Before: After: Cause: Links with bottom references that clash with Small test: 'use strict';
const marked = require('marked');
const md = `
Link: [constructor][].
[constructor]: https://example.org/
`;
const tokens = marked.lexer(md);
console.log(tokens);
const html = marked.parser(tokens);
console.log(`\n${html}`); With the current version: [ { type: 'space' },
{ type: 'paragraph', text: 'Link: [constructor][].' },
links: { constructor: { href: 'https://example.org/', title: undefined } } ]
<p>Link: <a href="https://example.org/">constructor</a>.</p> With the new version: [ { type: 'space' },
{ type: 'paragraph', text: 'Link: [constructor][].' },
{ type: 'space' },
links: {} ]
<p>Link: [constructor][].</p> |
It seems this is all I could find. The new version also resolves conflicting references (see #20100) in a different way, but this is our mess and we can ignore this for now. However, we may need to wait for fixing above-enumerated issues before the upgrade. |
Thanks for finding and clearly detailing all the issues that need to be addressed, @vsemozhetbyt! Labeling |
514bea0
to
09cdca8
Compare
@vsemozhetbyt That appears to be a bug in our markdown and not a bug in the https://daringfireball.net/projects/markdown/syntax#html:
|
Solution is probably to have html.js process the markdown before adding the div tag. I'll see about doing that as part of this PR. |
Well, that was a bit long on the debugging front, but it turns out the solution was to remove a single unnecessary line from |
The source in crypto.md: <tr>
<td><code>RSA_PSS_SALTLEN_DIGEST</code></td>
<td>Sets the salt length for `RSA_PKCS1_PSS_PADDING` to the digest size
when signing or verifying.</td>
</tr>
<tr>
<td><code>RSA_PSS_SALTLEN_MAX_SIGN</code></td>
<td>Sets the salt length for `RSA_PKCS1_PSS_PADDING` to the maximum
permissible value when signing data.</td>
</tr>
<tr>
<td><code>RSA_PSS_SALTLEN_AUTO</code></td>
<td>Causes the salt length for `RSA_PKCS1_PSS_PADDING` to be determined
automatically when verifying a signature.</td>
</tr> Before: <tr>
<td><code>RSA_PSS_SALTLEN_DIGEST</code></td>
<td>Sets the salt length for <code>RSA_PKCS1_PSS_PADDING</code> to the digest size
when signing or verifying.</td>
</tr>
<tr>
<td><code>RSA_PSS_SALTLEN_MAX_SIGN</code></td>
<td>Sets the salt length for <code>RSA_PKCS1_PSS_PADDING</code> to the maximum
permissible value when signing data.</td>
</tr>
<tr>
<td><code>RSA_PSS_SALTLEN_AUTO</code></td>
<td>Causes the salt length for <code>RSA_PKCS1_PSS_PADDING</code> to be determined
automatically when verifying a signature.</td>
</tr> After (see preserved backticks instead of <tr>
<td><code>RSA_PSS_SALTLEN_DIGEST</code></td>
<td>Sets the salt length for `RSA_PKCS1_PSS_PADDING` to the digest size
when signing or verifying.</td>
</tr>
<tr>
<td><code>RSA_PSS_SALTLEN_MAX_SIGN</code></td>
<td>Sets the salt length for `RSA_PKCS1_PSS_PADDING` to the maximum
permissible value when signing data.</td>
</tr>
<tr>
<td><code>RSA_PSS_SALTLEN_AUTO</code></td>
<td>Causes the salt length for `RSA_PKCS1_PSS_PADDING` to be determined
automatically when verifying a signature.</td>
</tr> Also in <td>Flag indicating reading accesses to the file system will no longer
result in an update to the `atime` information associated with the file.
This flag is available on Linux operating systems only.</td> Also several fragments in Also |
Another example of issue 1: For *all* [`EventEmitter`][] objects, if an `'error'` event handler is not
provided, the error will be thrown, causing the Node.js process to report an
uncaught exception and crash unless either: The [`domain`][domains] module is
used appropriately or a handler has been registered for the
[`'uncaughtException'`][] event. Before: <p>For <em>all</em> <a href="events.html#events_class_eventemitter"><code>EventEmitter</code></a> objects, if an <code>'error'</code> event handler is not
provided, the error will be thrown, causing the Node.js process to report an
uncaught exception and crash unless either: The <a href="domain.html"><code>domain</code></a> module is
used appropriately or a handler has been registered for the
<a href="process.html#process_event_uncaughtexception"><code>'uncaughtException'</code></a> event.</p> After: <p>For <em>all</em> <a href="domain.html"><code>EventEmitter</code>][] objects, if an <code>'error'</code> event handler is not
provided, the error will be thrown, causing the Node.js process to report an
uncaught exception and crash unless either: The [<code>domain</code></a> module is
used appropriately or a handler has been registered for the
<a href="process.html#process_event_uncaughtexception"><code>'uncaughtException'</code></a> event.</p> Also in Source:### net.createConnection(path[, connectListener])
<!-- YAML
added: v0.1.90
-->
* `path` {string} Path the socket should connect to. Will be passed to
[`socket.connect(path[, connectListener])`][`socket.connect(path)`].
See [Identifying paths for IPC connections][].
* `connectListener` {Function} Common parameter of the
[`net.createConnection()`][] functions, an "once" listener for the
`'connect'` event on the initiating socket. Will be passed to
[`socket.connect(path[, connectListener])`][`socket.connect(path)`].
* Returns: {net.Socket} The newly created socket used to start the connection.
Initiates an [IPC][] connection.
This function creates a new [`net.Socket`][] with all options set to default,
immediately initiates connection with
[`socket.connect(path[, connectListener])`][`socket.connect(path)`],
then returns the `net.Socket` that starts the connection.
### net.createConnection(port[, host][, connectListener])
<!-- YAML
added: v0.1.90
-->
* `port` {number} Port the socket should connect to. Will be passed to
[`socket.connect(port[, host][, connectListener])`][`socket.connect(port, host)`].
* `host` {string} Host the socket should connect to. Will be passed to
[`socket.connect(port[, host][, connectListener])`][`socket.connect(port, host)`].
**Default:** `'localhost'`.
* `connectListener` {Function} Common parameter of the
[`net.createConnection()`][] functions, an "once" listener for the
`'connect'` event on the initiating socket. Will be passed to
[`socket.connect(path[, connectListener])`][`socket.connect(port, host)`].
* Returns: {net.Socket} The newly created socket used to start the connection.
Initiates a TCP connection.
This function creates a new [`net.Socket`][] with all options set to default,
immediately initiates connection with
[`socket.connect(port[, host][, connectListener])`][`socket.connect(port, host)`],
then returns the `net.Socket` that starts the connection. Also many problem fragments in The `writable.cork()` method forces all written data to be buffered in memory.
The buffered data will be flushed when either the [`stream.uncork()`][] or
[`stream.end()`][stream-end] methods are called. For backwards compatibility reasons, removing [`'data'`][] event handlers will
**not** automatically pause the stream. Also, if there are piped destinations,
then calling [`stream.pause()`][stream-pause] will not guarantee that the
stream will *remain* paused once those destinations drain and ask for more data. |
I've updated the last comments with more examples. Sum up: on this stage, we still have 2 issues with the new |
Remove unneeded line that will cause links to break when we update to marked 0.4.0.
Simplify the inline markdown in buffer.md. Admission: This is to work around an issue that arises when this markdown is processed by `marked` v0.4.0.
Links in crypto.md table had previously been autolinking, but correct Mardkwon processing will not autolink in block elements like tables. Upcoming marked 0.4.0 will not autolink in this instance, so add the links as anchor elements.
Update marked to 0.4.0 for use with the tools/doc/*.js tools.
Float a patch to deal with punctuation at the end of URLs in a more robust fashion. Refs: markedjs/marked#1293
Float a patch to deal with empty table cells. Refs: markedjs/marked#1262
Float a patch on marked-0.4.0 that fixes a bug that breaks links that are named the same as properties on the Object prototype. Refs: markedjs/marked#1299
Issue 1 is a bug in Issue 2 is a bug in our markdown and I wouldn't be surprised if other markdown processors also had the same issue. For those, I'll try to fix our markdown. |
In case we float any patches, would it be viable to note them as things to take care of when the nodejs website is redesigned, which is being thought about, according to this issue: nodejs/nodejs.org#1534 ? |
@Trott I'm curious as to the bug in our markdown? The following works with both marked and remark: 'use strict';
const marked = require('marked');
const unified = require('unified');
const markdown = require('remark-parse');
const remark2rehype = require('remark-rehype');
const html = require('rehype-stringify');
const raw = require('rehype-raw');
const testData = `
| h | h | h |
|-------|-----|-----------|
|<b>x<b>|**x**|[x](a.html)|
`
console.log('marked:');
console.log(marked(testData));
console.log('');
console.log('unified/remark/rehype:');
console.log(unified()
.use(markdown)
.use(remark2rehype, {allowDangerousHTML: true})
.use(raw)
.use(html)
.processSync(testData).toString().trim()); |
@rubys Yes, that example seems to work correctly in both. Markdown inside a markdown table is 👍. This code should show the issue. Markdown inside HTML block elements should not be expanded: <table>
<tr>
<td>**should not be bold**</td>
</tr>
</table> Some of our docs have markdown inside HTML table elements. According to markdown specs as I understand them, that markdown should not be expanded. It was expanded in marked 0.3.5 and that's a bug that's fixed in marked 0.4.0. Unfortunately, we're relying on that bug. |
@Trott gotcha. I verified that that marked 0.4 and remark produce the same output for that input. Note: given how unified pipelining works, it would not be difficult to write a processor that looks for text nodes within a table and expand them. It could even look for tables with a given |
Question: what is the status of this effort, and on evaluating the possible conversion to remark? I ask because it may impact other work. At the moment, it seems clear that remark is easier to extend cleanly, but runs slower. Moving the generation of all.html and all.json away from reprocessing the input (done) and generating both the html and json in one pass (planned) will mitigate the performance difference. The generation of JSON isn't complete, but a proof of concept has been posted. marked also seems less "spec compliant" and more prone to change from release to release, but that might be because we don't have enough experience with remark. Additionally, the build process for node already is using remark... it would be ideal to consolidate to one markdown engine. Am I missing anything? (And, yes, at this point I may be biased :-) ) |
As for the new toolchain with |
I think the effort of @rubys to get I'm leaving this open because I will indeed push it forward if something happens to the I'm also leaving it open as a to-do list of fixing the table markup in places where it is not compliant. I would encourage moving ahead with |
I was originally going to say something rather meek like "as someone new to both marked and remark, I find remark more approachable", but having just completed my first pass at updating the conversion to JSON, it is clear that current marked based code to produce JSON is essentially unmaintainable. I've surfaced a number of issues in #21697, but the full set of differences can be found by looking for As a byproduct of this effort, I compared the output of the conversions for every description, and will submit a separate pull request with changes to the markdown to make it work with both sets of toolchains. |
I'm feeling good about the progress @rubys has been making moving to |
The problem with doctools is that they are rather old and it seems we may have no active collaborators that created them. I've done some cleanup in doctools this spring and json.js was the most complicated one, sometimes I felt that I went in darkness or minefield) Moreover, I had no context about the use cases for JSON results or any criteria for required completeness and validness of the results. So if anybody wants to even completely recreate the json tools contacting with JSON users, this may be the best, if they can maintain the new scripts to some extent. |
If we can make progress landing the changes I've made to convert from marked to remark, I'll certainly support the code. If you are looking for somebody to own the current marked based implementation of JSON generation... sorry, but I would need to throw a ENOTME Error on that. :-) |
Any tool will do with a human being helping, I think) |
The RunKit team certainly has a vested interest in both the JSON generation
(which I guess we might be the sole known consumers of?) as well as the
documentation. We'd certainly be willing to help or take ownership or
provide assistance in whatever way would be most helpful.
…On Thu, Jul 12, 2018 at 11:50 AM Sam Ruby ***@***.***> wrote:
If we can make progress landing the changes I've made to convert from
marked to remark, I'll certainly support the code. If you are looking for
somebody to own the current marked based implementation of JSON
generation... sorry, but I would need to throw a ENOTME Error on that. :-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21081 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABcyZKj4SVPPMhpgPYmbE49RF8c-mtTks5uF5qPgaJpZM4UWin->
.
--
Francisco Tolmasky
www.tolmasky.com
tolmasky@gmail.com
|
We state JSON output as experimental now, so it seems we have any wiggle room we want. |
Update marked to 0.4.0 for use with the tools/doc/*.js tools.
Ref: https://nodesecurity.io/advisories/531
(Yes, that is extraordinarily unlikely to impact us in this context.)
@styfle @vsemozhetbyt
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes