Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: v2 performance tweaks #311

Closed
rtfb opened this issue Oct 5, 2016 · 20 comments
Closed

Discussion: v2 performance tweaks #311

rtfb opened this issue Oct 5, 2016 · 20 comments

Comments

@rtfb
Copy link
Collaborator

rtfb commented Oct 5, 2016

Hi all,

I have recently pushed a branch, v2-perf-tweaks, with some initial performance improvements for v2. I'm opening this discussion thread for discussing whatever issues people might find reviewing that code and/or suggestions/observations for further improvement.

@rtfb
Copy link
Collaborator Author

rtfb commented Oct 5, 2016

As a seed for discussion I want to bring your attention to 993325d. It gets rid of the general purpose html.EscapeString-ing and rolls our own version. This gives a significant performance boost (~25% faster on CPU, ~45% less allocs), but is significantly narrower in scope. It only deals with &, ", < and >, like v1 did, and will leave anything else alone.

I want other people's opinions on whether this is a reasonable trade-off or not.

@dmitshur
Copy link
Collaborator

dmitshur commented Oct 5, 2016

is significantly narrower in scope. It only deals with &, ", < and >, like v1 did, and will leave anything else alone.

In what way is the scope of html.EscapeCompare larger? What does it do that escapeHTML doesn't? I only see a difference in handling of the ' character.

@rtfb
Copy link
Collaborator Author

rtfb commented Oct 5, 2016

Hm. Looking at its code, you're right, the only difference is that it also handles the '. I must have mixed up html.EscapeString and html.UnescapeString in my head -- the latter is much heavier beast. Well, then this change is less controversial than I thought.

Give all of them some good scrutiny, though :-)

@Ambrevar
Copy link

Ambrevar commented Oct 6, 2016

Nice job! Runs twice as fast for me!

Try out the suggestions in my code review: they might save some more cycles.

@Ambrevar
Copy link

Ambrevar commented Oct 9, 2016

Note that you can use the same table trick for html.go/cellAlignment:

var cellAlignment = [4]byte{
    0: 'l',
    bf.TableAlignmentLeft:   'l',
    bf.TableAlignmentRight:  'r',
    bf.TableAlignmentCenter: 'c',
}

This will not save as much though since it is just a switch.

@rtfb
Copy link
Collaborator Author

rtfb commented Oct 9, 2016

Note that you can use the same table trick for html.go/cellAlignment:

I'll see to it, but it's very far from a critical path, so I don't expect it to have any performance influence and will pick the one that's more readable.

@rtfb
Copy link
Collaborator Author

rtfb commented Nov 10, 2016

120bb2f gets rid of the preprocess stage. Yay!

It's a bigish commit, most of it being a lot of tiny changes in block.go. I'm rather confident-ish in what they do due to our extensive tests, but knowing that some uncovered holes get reported from time to time, this certainly might introduce some regressions. All ideas how can we be proactive about catching those regressions are welcome.

Another point of concern (luckily, much smaller in magnitude) is that I had to tweak some tests. The changes all seem to be for the better, mostly fixing tab expansion behavior and one weird SkipHTML case. These at least seem to be possible to verify by eyeballing. Please do.

Performance-wise, this brings us into the territory of being only 10-15% slower than v1. Gives me hope that it can actually get faster than v1 some day :-)

@Ambrevar
Copy link

Ambrevar commented Nov 11, 2016

I've checked the impact on the LaTeX renderer: The Footnote test does not pass anymore, not sure why:

            input: `[^foo]
[^foo]: bar`,
            want: `\href{bar}{^foo}` + "\n",
            // Footnote extension is turned _off_.

Result: "[^foo]\n[^foo]: bar\n"

This suggests that it is not renderer related because the link rendering is not even triggered. Any clue?

I've re-rolled up my Markdown-generated website, and here are the differences I could spot in the output:

  • Tabs are no longer turned into space in blockquotes and codeblocks. Good thing.
  • ... That's it!?!

@rtfb
Copy link
Collaborator Author

rtfb commented Nov 24, 2016

I've checked the impact on the LaTeX renderer: The Footnote test does not pass anymore, not sure why:

Could you please bisect to find out which commit does that?

@Ambrevar
Copy link

It is 120bb2f, the removal of the preprocessor.

I've narrowed the issue: it comes from the lack of new space in my test.
scanLinkRef seems to have a problem with that. If I remove

	if i == len(data) {
		return
	}

at the beginning, all tests pass (both blackfriday & blackfriday-latex).

The newline requirement issue might be more pervasive than that though.
What do you think of appending \n at the end of input when missing?

@rtfb
Copy link
Collaborator Author

rtfb commented Dec 6, 2016

Yeah, bugs due to hidden assumptions about the newlines is what I was mostly afraid of. I would like to avoid sticking the newline at the end of input though. Let's try fishing out those bugs case by case.

@Ambrevar
Copy link

Agreed, I don't like "fixing in the dark" either.

I'd assume the remaining false assumptions to involve >= or == comparisons to len(data) followed by a return. Here is an editor-friendly AWK one-liner:

awk '/= len\(data\)|len\(data\) >?=/ {getline x; if(index(x, "return")) printf("%s:%s::%s\t%s\n", FILENAME, FNR, $0, x)}' *.go

(Got 32 results, not counting the patch I previously inlined in the conversation.)

I had a quick glance, but all the conditions in those functions may be a tad subtle to the unwary eye. You are probably in a better position than me to check this ;)

@Ambrevar
Copy link

Ambrevar commented Feb 6, 2017

Any update?

@rtfb
Copy link
Collaborator Author

rtfb commented Feb 6, 2017

Ahh, sorry, it slipped my mind completely. I'll try to look into that some time this week.

@rtfb
Copy link
Collaborator Author

rtfb commented Feb 12, 2017

Well, that's funny. Turns out the three lines are probably the most contentious lines in the entire library.

The check was introduced with d28de22, when fixing #172 and #173.
Then I removed it with bcd6dd8 when fixing #180 (apparently, an identical case you're having now).
And then it was reintroduced with 232d06c when fixing regression.

It seems that they can be removed again. All these cases now have tests (including the one from 69f51af, which seems to have landed to v1 only) and they all pass. I'll submit a PR.

As to the other instances, I'll try giving them a critical look (thanks for the awk magic! :-)), but I don't have much hope in finding any bugs just by staring. And this story also suggests to review all v1 commits and migrate relevant stuff to v2.

@rtfb rtfb mentioned this issue Feb 12, 2017
@Ambrevar
Copy link

Ambrevar commented Mar 1, 2017

Agreed, but the more we wait with the v1/v2 divergence online, the worse it will get. Time for a merge?

@rtfb
Copy link
Collaborator Author

rtfb commented Mar 2, 2017

There's not much going on on v1, plus it's under our control, so it's not divergence that worries me. I'd like to announce v2 officially to let people know that the project is active (well, for a given value of active, that is :-)), and to effectively seal off v1 and open up v2 for feature development and fixes.

I feel like #328 xor #337 is the last thing to get merged before v2 announcement can be placed in the master README.

And #322 itself is already merged, so we can probably close this issue as well.

@Ambrevar
Copy link

Ambrevar commented Mar 3, 2017

Alright. I'd suggest we lood at those issues as well:

@rtfb
Copy link
Collaborator Author

rtfb commented Mar 3, 2017

#277 was done, but not closed. Just cleaned that up.
My understanding was that #316 does not influence a public interface, so it can be done at any later time?

@Ambrevar
Copy link

Ambrevar commented Mar 3, 2017

True, but it does influence the parser. Depends what kind of backward compatibility guarantee you want to give to the library.

@rtfb rtfb closed this as completed Mar 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants