Discussion: v2 performance tweaks #311

rtfb · 2016-10-05T18:34:45Z

Hi all,

I have recently pushed a branch, v2-perf-tweaks, with some initial performance improvements for v2. I'm opening this discussion thread for discussing whatever issues people might find reviewing that code and/or suggestions/observations for further improvement.

rtfb · 2016-10-05T18:36:55Z

As a seed for discussion I want to bring your attention to 993325d. It gets rid of the general purpose html.EscapeString-ing and rolls our own version. This gives a significant performance boost (~25% faster on CPU, ~45% less allocs), but is significantly narrower in scope. It only deals with &, ", < and >, like v1 did, and will leave anything else alone.

I want other people's opinions on whether this is a reasonable trade-off or not.

dmitshur · 2016-10-05T18:45:37Z

is significantly narrower in scope. It only deals with &, ", < and >, like v1 did, and will leave anything else alone.

In what way is the scope of html.EscapeCompare larger? What does it do that escapeHTML doesn't? I only see a difference in handling of the ' character.

rtfb · 2016-10-05T18:57:43Z

Hm. Looking at its code, you're right, the only difference is that it also handles the '. I must have mixed up html.EscapeString and html.UnescapeString in my head -- the latter is much heavier beast. Well, then this change is less controversial than I thought.

Give all of them some good scrutiny, though :-)

Ambrevar · 2016-10-06T13:29:30Z

Nice job! Runs twice as fast for me!

Try out the suggestions in my code review: they might save some more cycles.

Ambrevar · 2016-10-09T03:04:25Z

Note that you can use the same table trick for html.go/cellAlignment:

var cellAlignment = [4]byte{
    0: 'l',
    bf.TableAlignmentLeft:   'l',
    bf.TableAlignmentRight:  'r',
    bf.TableAlignmentCenter: 'c',
}

This will not save as much though since it is just a switch.

rtfb · 2016-10-09T09:44:03Z

Note that you can use the same table trick for html.go/cellAlignment:

I'll see to it, but it's very far from a critical path, so I don't expect it to have any performance influence and will pick the one that's more readable.

rtfb · 2016-11-10T20:18:27Z

120bb2f gets rid of the preprocess stage. Yay!

It's a bigish commit, most of it being a lot of tiny changes in block.go. I'm rather confident-ish in what they do due to our extensive tests, but knowing that some uncovered holes get reported from time to time, this certainly might introduce some regressions. All ideas how can we be proactive about catching those regressions are welcome.

Another point of concern (luckily, much smaller in magnitude) is that I had to tweak some tests. The changes all seem to be for the better, mostly fixing tab expansion behavior and one weird SkipHTML case. These at least seem to be possible to verify by eyeballing. Please do.

Performance-wise, this brings us into the territory of being only 10-15% slower than v1. Gives me hope that it can actually get faster than v1 some day :-)

Ambrevar · 2016-11-11T04:34:15Z

I've checked the impact on the LaTeX renderer: The Footnote test does not pass anymore, not sure why:

            input: `[^foo]
[^foo]: bar`,
            want: `\href{bar}{^foo}` + "\n",
            // Footnote extension is turned _off_.

Result: "[^foo]\n[^foo]: bar\n"

This suggests that it is not renderer related because the link rendering is not even triggered. Any clue?

I've re-rolled up my Markdown-generated website, and here are the differences I could spot in the output:

Tabs are no longer turned into space in blockquotes and codeblocks. Good thing.
... That's it!?!

rtfb · 2016-11-24T18:18:06Z

I've checked the impact on the LaTeX renderer: The Footnote test does not pass anymore, not sure why:

Could you please bisect to find out which commit does that?

Ambrevar · 2016-11-29T11:58:49Z

It is 120bb2f, the removal of the preprocessor.

I've narrowed the issue: it comes from the lack of new space in my test.
scanLinkRef seems to have a problem with that. If I remove

	if i == len(data) {
		return
	}

at the beginning, all tests pass (both blackfriday & blackfriday-latex).

The newline requirement issue might be more pervasive than that though.
What do you think of appending \n at the end of input when missing?

rtfb · 2016-12-06T19:38:45Z

Yeah, bugs due to hidden assumptions about the newlines is what I was mostly afraid of. I would like to avoid sticking the newline at the end of input though. Let's try fishing out those bugs case by case.

Ambrevar · 2016-12-10T11:09:02Z

Agreed, I don't like "fixing in the dark" either.

I'd assume the remaining false assumptions to involve >= or == comparisons to len(data) followed by a return. Here is an editor-friendly AWK one-liner:

awk '/= len\(data\)|len\(data\) >?=/ {getline x; if(index(x, "return")) printf("%s:%s::%s\t%s\n", FILENAME, FNR, $0, x)}' *.go

(Got 32 results, not counting the patch I previously inlined in the conversation.)

I had a quick glance, but all the conditions in those functions may be a tad subtle to the unwary eye. You are probably in a better position than me to check this ;)

Ambrevar · 2017-02-06T14:16:50Z

Any update?

rtfb · 2017-02-06T20:30:25Z

Ahh, sorry, it slipped my mind completely. I'll try to look into that some time this week.

rtfb · 2017-02-12T15:49:00Z

Well, that's funny. Turns out the three lines are probably the most contentious lines in the entire library.

The check was introduced with d28de22, when fixing #172 and #173.
Then I removed it with bcd6dd8 when fixing #180 (apparently, an identical case you're having now).
And then it was reintroduced with 232d06c when fixing regression.

It seems that they can be removed again. All these cases now have tests (including the one from 69f51af, which seems to have landed to v1 only) and they all pass. I'll submit a PR.

As to the other instances, I'll try giving them a critical look (thanks for the awk magic! :-)), but I don't have much hope in finding any bugs just by staring. And this story also suggests to review all v1 commits and migrate relevant stuff to v2.

Ambrevar · 2017-03-01T09:50:09Z

Agreed, but the more we wait with the v1/v2 divergence online, the worse it will get. Time for a merge?

rtfb · 2017-03-02T20:18:01Z

There's not much going on on v1, plus it's under our control, so it's not divergence that worries me. I'd like to announce v2 officially to let people know that the project is active (well, for a given value of active, that is :-)), and to effectively seal off v1 and open up v2 for feature development and fixes.

I feel like #328 xor #337 is the last thing to get merged before v2 announcement can be placed in the master README.

And #322 itself is already merged, so we can probably close this issue as well.

Ambrevar · 2017-03-03T12:00:57Z

Alright. I'd suggest we lood at those issues as well:

v2: Fix extensions passing to renderer and main Markdown func #277 matters for the API.
v2: Broken NoIntraEmpasis and other emph rules #316 is currently broken and albeit an extension, CommonMark specifies it and provides a parsing solution.

rtfb · 2017-03-03T17:54:51Z

#277 was done, but not closed. Just cleaned that up.
My understanding was that #316 does not influence a public interface, so it can be done at any later time?

Ambrevar · 2017-03-03T18:05:11Z

True, but it does influence the parser. Depends what kind of backward compatibility guarantee you want to give to the library.

rtfb mentioned this issue Oct 5, 2016

document v2 installation #292

Closed

rtfb added v2 discussion labels Oct 5, 2016

rtfb mentioned this issue Feb 12, 2017

Fix #180 again #331

Merged

rtfb closed this as completed Mar 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: v2 performance tweaks #311

Discussion: v2 performance tweaks #311

rtfb commented Oct 5, 2016

rtfb commented Oct 5, 2016

dmitshur commented Oct 5, 2016

rtfb commented Oct 5, 2016

Ambrevar commented Oct 6, 2016

Ambrevar commented Oct 9, 2016

rtfb commented Oct 9, 2016

rtfb commented Nov 10, 2016

Ambrevar commented Nov 11, 2016 •

edited

Loading

rtfb commented Nov 24, 2016

Ambrevar commented Nov 29, 2016

rtfb commented Dec 6, 2016

Ambrevar commented Dec 10, 2016

Ambrevar commented Feb 6, 2017

rtfb commented Feb 6, 2017

rtfb commented Feb 12, 2017

Ambrevar commented Mar 1, 2017

rtfb commented Mar 2, 2017

Ambrevar commented Mar 3, 2017

rtfb commented Mar 3, 2017

Ambrevar commented Mar 3, 2017

Discussion: v2 performance tweaks #311

Discussion: v2 performance tweaks #311

Comments

rtfb commented Oct 5, 2016

rtfb commented Oct 5, 2016

dmitshur commented Oct 5, 2016

rtfb commented Oct 5, 2016

Ambrevar commented Oct 6, 2016

Ambrevar commented Oct 9, 2016

rtfb commented Oct 9, 2016

rtfb commented Nov 10, 2016

Ambrevar commented Nov 11, 2016 • edited Loading

rtfb commented Nov 24, 2016

Ambrevar commented Nov 29, 2016

rtfb commented Dec 6, 2016

Ambrevar commented Dec 10, 2016

Ambrevar commented Feb 6, 2017

rtfb commented Feb 6, 2017

rtfb commented Feb 12, 2017

Ambrevar commented Mar 1, 2017

rtfb commented Mar 2, 2017

Ambrevar commented Mar 3, 2017

rtfb commented Mar 3, 2017

Ambrevar commented Mar 3, 2017

Ambrevar commented Nov 11, 2016 •

edited

Loading