Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: unwanted linebreaks added when translating html blockquote to markdown #554

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 26 additions & 9 deletions __tests__/ExpensiMark-Markdown-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -140,9 +140,9 @@ test('Test HTML string with blockquote', () => {
+ '<blockquote>line1\nline2\n\nsome <em>lorem ipsum</em> into a blockquote in Slack, copy it to the\n\n\nbuffer </blockquote>'
+ '<blockquote style="color:red;" data-label="note">line1 <em>lorem ipsum</em></blockquote>';

const resultString = '\n> This GH seems to assume that there will be something in the paste\n> buffer when you copy block-quoted text out of slack. But when I dump\n> some _lorem ipsum_ into a blockquote in Slack, copy it to the\n> buffer, then dump it into terminal, there\'s nothing. And if I dump it\n'
+ '\n> line1\n> line2\n> \n> some _lorem ipsum_ into a blockquote in Slack, copy it to the\n> \n> \n> buffer\n'
+ '\n> line1 _lorem ipsum_\n';
const resultString = '> This GH seems to assume that there will be something in the paste\n> buffer when you copy block-quoted text out of slack. But when I dump\n> some _lorem ipsum_ into a blockquote in Slack, copy it to the\n> buffer, then dump it into terminal, there\'s nothing. And if I dump it\n'
+ '> line1\n> line2\n> \n> some _lorem ipsum_ into a blockquote in Slack, copy it to the\n> \n> \n> buffer\n'
+ '> line1 _lorem ipsum_';

expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
Expand Down Expand Up @@ -493,7 +493,7 @@ test('map div with encoded entities', () => {

test('map div with quotes', () => {
const testString = '<div><blockquote>line 1</blockquote></div>line 2</div>';
const resultString = '\n> line 1\nline 2';
const resultString = '> line 1\nline 2';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});

Expand All @@ -511,7 +511,7 @@ test('map real message from app', () => {

test('map real message with quotes', () => {
const testString = '<div><div><div><div><div><div><div><div><div><div><div><div><comment><blockquote><div>hi</div></blockquote><br></comment></div></div></div></div></div></div><div><div><div><div><div><svg><path/><path/></svg></div></div></div><div><div><div><svg><path/><path/></svg></div></div></div><div><div><div><svg><path/></svg></div></div></div><div><div><div><svg><path/></svg></div></div></div></div></div></div></div></div></div></div><div><div><div><div><div><div><div><div><div><div><div><comment><blockquote><div>hi</div></blockquote><br></comment></div></div></div></div></div></div></div></div></div></div></div></div>';
const resultString = '\n> hi\n\n\n> hi\n\n';
const resultString = '> hi\n> hi';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});

Expand Down Expand Up @@ -651,14 +651,31 @@ test('Test codeFence backticks occupying a separate line while not introducing r

test('Test blockquote with h1 inside', () => {
let testString = '<blockquote><h1>heading</h1></blockquote>';
expect(parser.htmlToMarkdown(testString)).toBe('\n> # heading\n');
expect(parser.htmlToMarkdown(testString)).toBe('> # heading');

testString = '<blockquote><h1>heading</h1>test</blockquote>';
expect(parser.htmlToMarkdown(testString)).toBe('\n> # heading\n> test\n');
expect(parser.htmlToMarkdown(testString)).toBe('> # heading\n> test');

testString = '<blockquote>test<h1>heading</h1>test</blockquote>';
expect(parser.htmlToMarkdown(testString)).toBe('\n> test\n> # heading\n> test\n');
expect(parser.htmlToMarkdown(testString)).toBe('> test\n> # heading\n> test');

testString = '<blockquote><h1>heading A</h1><h1>heading B</h1></blockquote>';
expect(parser.htmlToMarkdown(testString)).toBe('\n> # heading A\n> # heading B\n');
expect(parser.htmlToMarkdown(testString)).toBe('> # heading A\n> # heading B');
});

test('Test blockquote linebreak handling with text, block and inline elements', () => {
const testStringQuotes = '<blockquote>one line quote a</blockquote><br /><blockquote>two line quote b<br />two line quote b</blockquote><br /><blockquote>quote c with internal line break<br /><br />quote c with internal line break</blockquote>';
expect(parser.htmlToMarkdown(testStringQuotes)).toBe('> one line quote a\n\n> two line quote b\n> two line quote b\n\n> quote c with internal line break\n> \n> quote c with internal line break');

const testStringSurroundedByText = 'text a<br /><blockquote>quote a</blockquote><br /><blockquote>quote b</blockquote>text b<br /><br />text c<br /><blockquote>quote c</blockquote>text c';
expect(parser.htmlToMarkdown(testStringSurroundedByText)).toBe('text a\n> quote a\n\n> quote b\ntext b\n\ntext c\n> quote c\ntext c');

const testStringSurroundedByInlineElement = '<em>italic a</em><br /><blockquote>quote a</blockquote><br /><blockquote>quote b</blockquote><strong>bold b</strong><br /><br /><em>italic c</em><br /><blockquote>quote c</blockquote><strong>bold c</strong>';
expect(parser.htmlToMarkdown(testStringSurroundedByInlineElement)).toBe('_italic a_\n> quote a\n\n> quote b\n*bold b*\n\n_italic c_\n> quote c\n*bold c*');

const testStringSurroundedByBlockElementCodeFence = '<pre>code fence a</pre><blockquote>quote a</blockquote><br /><blockquote>quote b</blockquote><pre>code fence b</pre><br /><pre>code fence c</pre><blockquote>quote c</blockquote><pre>code fence c</pre>';
expect(parser.htmlToMarkdown(testStringSurroundedByBlockElementCodeFence)).toBe('```\ncode fence a\n```\n> quote a\n\n> quote b\n```\ncode fence b\n```\n\n```\ncode fence c\n```\n> quote c\n```\ncode fence c\n```');

const testStringSurroundedByBlockElementHeading = '<h1>h1 a</h1><blockquote>quote a</blockquote><br /><blockquote>quote b</blockquote><h1>h1 b</h1><br /><h1>h1 c</h1><blockquote>quote c</blockquote><h1>h1 c</h1>';
expect(parser.htmlToMarkdown(testStringSurroundedByBlockElementHeading)).toBe('# h1 a\n> quote a\n\n> quote b\n# h1 b\n\n# h1 c\n> quote c\n# h1 c');
});
11 changes: 5 additions & 6 deletions lib/ExpensiMark.js
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ export default class ExpensiMark {
},
{
name: 'quote',
regex: /\n?<(blockquote|q)(?:"[^"]*"|'[^']*'|[^'">])*>([\s\S]*?)<\/\1>(?![^<]*(<\/pre>|<\/code>))/gi,
regex: /<(blockquote|q)(?:"[^"]*"|'[^']*'|[^'">])*>([\s\S]*?)<\/\1>(?![^<]*(<\/pre>|<\/code>))/gi,
replacement: (match, g1, g2) => {
// We remove the line break before heading inside quote to avoid adding extra line
const resultString = g2.replace(/\n?(<h1># )/g, '$1')
Expand All @@ -302,7 +302,9 @@ export default class ExpensiMark {
.split('\n')
.map(m => `> ${m}`)
.join('\n');
return `\n${resultString}\n`;

// We want to keep <blockquote> tag here and let method replaceBlockElementWithNewLine to handle the line break later
return `<blockquote>${resultString}</blockquote>`;
},
},
{
Expand Down Expand Up @@ -524,7 +526,7 @@ export default class ExpensiMark {
*/
replaceBlockElementWithNewLine(htmlString) {
// eslint-disable-next-line max-len
let splitText = htmlString.split(/<div.*?>|<\/div>|<comment.*?>|\n<\/comment>|<\/comment>|<h1>|<\/h1>|<h2>|<\/h2>|<h3>|<\/h3>|<h4>|<\/h4>|<h5>|<\/h5>|<h6>|<\/h6>|<p>|<\/p>|<li>|<\/li>/);
let splitText = htmlString.split(/<div.*?>|<\/div>|<comment.*?>|\n<\/comment>|<\/comment>|<h1>|<\/h1>|<h2>|<\/h2>|<h3>|<\/h3>|<h4>|<\/h4>|<h5>|<\/h5>|<h6>|<\/h6>|<p>|<\/p>|<li>|<\/li>|<blockquote>|<\/blockquote>/);
splitText = splitText.map(text => Str.stripHTML(text));
let joinedText = '';

Expand Down Expand Up @@ -654,9 +656,6 @@ export default class ExpensiMark {
if (textToFormat !== '') {
replacedText += this.formatTextForQuote(regex, textToFormat, replacement);
}

// Replace all the blank lines between quotes, if there are only blank lines between them
replacedText = replacedText.replace(/(<\/blockquote>((\s*)+)<blockquote>)/g, '</blockquote><blockquote>');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safe to remove this as it's not covered by any unit test.
I think we should retain line breaks between quotes because the App prefers to display user's input as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm can you look into when it was added and why please?

Copy link
Contributor Author

@eh2077 eh2077 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iwiznia The change was added in this PR #385 which is quite old. This change was included at that time but no unit test case was added together in PR #385. From the corresponding original issue Expensify/App#2670, I found that it fixed retaining only one empty line within a quote if there’re more than one. But later we wanted to retain all empty rows inside a quote, see slack discussion https://expensify.slack.com/archives/C01GTK53T8Q/p1682331970891359.

I have fixed another issue #531 to avoid removing line break before and after heading. The App has been reported many issues about line break handling. And I found that we prefer to keep line breaks from user's input as it is instead of removing them, which is different from markdown rendering on Github.

So, I think, in this case, we can also remove it to keep line breaks between quotes. Strictly speaking, it can be out of the scope of this issue but given it's annoying when performing tests and is easy to fix without breaking existing tests, I prefer to fix it and add tests to cover it together in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly bump @iwiznia

} else {
// If we doesn't have matches make sure the function will return the same textToCheck
replacedText = textToCheck;
Expand Down