Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: share markdown parsing in collect-strings and the report #9514

Merged
merged 5 commits into from
Aug 7, 2019

Conversation

brendankenny
Copy link
Member

@brendankenny brendankenny commented Aug 5, 2019

duplicated markdown parsing in dom.js for the report and collect-strings.js for adding placeholders for ctc files (which will eventually be reconstituted and then passed through dom.js methods) is a little risky, so the first commit here makes them share a splitting implementation in util.js.

There are no changes to the collected locale strings or the existing dom and collect-strings unit tests.

...but then I couldn't resist making an interface that's easier to understand (no more mysterious empty string for preambleText but undefined for linkText and linkHref). This is isolated to the second commit, so if it seems excessive I can drop it :)

@brendankenny
Copy link
Member Author

(a bit easier to see at https://github.com/GoogleChrome/lighthouse/pull/9514/files?w=1, most added lines are the expanded tests)

const parts = text.split(/\[([^\]]+?)\]\((https?:\/\/.*?)\)/g);
while (parts.length) {
// Pop off the same number of elements as there are capture groups.
const [preambleText, linkText, linkHref] = parts.splice(0, 3);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried rewriting this with i % 3 and have 0 and 1 cases, where the 1 case also pops off parts[i + 1], but it just got harder to follow, so stuck with this internally

/* global URL */

describe('util helpers', () => {
let origConsoleWarn;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed for old displayValue handling (#5099) but no need after #6767 and missed for removal in #7628

if (message.match(/\[.*\] \(.*\)/)) {
throw Error(`Bad Link spacing in message "${message}"`);
}
// * [](empty link text)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exterkamp and I discussed this empty linkText case and decided while maybe there's some vague possibility of a need for invisible links at some point, it's almost certainly an accident today, so let's alert the author

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


const parts = text.split(/\[([^\]]+?)\]\((https?:\/\/.*?)\)/g);
while (parts.length) {
// Pop off the same number of elements as there are capture groups.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Pop off the same number of elements as there are capture groups.
// Shift off the same number of elements as there are capture groups.

;)

largely a pedantic suggestion and pre-existing, so can reject if you'd like

// Pop off the same number of elements as there are capture groups.
const [preambleText, linkText, linkHref] = parts.splice(0, 3);

if (preambleText) { // Empty plain text is an artifact of splitting, not meaningful.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the empty explanation feels a bit weird here, maybe invert it or at least say "We can skip empty b/c ..."?

]);
});

it('splits on backticked code at the emd of the string', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it('splits on backticked code at the emd of the string', () => {
it('splits on backticked code at the end of the string', () => {

});
});

describe('#splitMarkdownLink', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love these ❤️

Copy link
Member

@exterkamp exterkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Like the collection of concerns into 1 file and all the new tests!

* into segments that were enclosed in backticks (marked as `isCode === true`)
* and those that outside the backticks (`isCode === false`).
* @param {string} text
* @return {Array<{isCode: true, codeText: string}|{isCode: false, plainText: string}>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to me to include a boolean flag and use different text fields. Makes more sense to me to have {isCode: boolean, text: string}? Is this less canonical for js?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a massive discussion about this very thing when discussing the shape of the proto and the hardened LHR API.

I 100% agree this feels anti-JS :)

OTOH, it's for internal use only, never goes over any wire, and we never really need to use just the text, so I don't have a strong reason to object yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha everyone lives with "preambleText" (does code typically come with a preamble?) including preambleText that occurs after the content and may not have any text in it, but I try to make self describing property names and everyone loses their minds :P

I think there is value in API safety like this (in this case tsc requires the isCode discriminator to be checked for the consuming code to be able to know if it can use plainText or codeText) but as you say it's internal only, for a minor feature, and I was really just playing around with the interface (not sure I really like it), so I can switch back to text or whatever :)

* `isLink === false`), and segments with text content and a URL that did make
* up a link (marked as `isLink === true`).
* @param {string} text
* @return {Array<{isLink: true, linkText: string, linkHref: string}|{isLink: false, plainText: string}>}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing about this. Seems like isLink with text would be sufficient.

]);
});

it('handles text only within backticks', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also curious for backticks separated by only a space if that would maintain the space e.g.

"`first code` `second code`"

...and that works as well. Feel free to add it as a test if that seems significant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add it as a test if that seems significant.

sure, no problem

@patrickhulce
Copy link
Collaborator

grumble grumble feedback

😆🤣😆🤣😆🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants