Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-write PDFHistory from scratch #8775

Merged
merged 6 commits into from
Sep 3, 2017
Merged

Re-write PDFHistory from scratch #8775

merged 6 commits into from
Sep 3, 2017

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Aug 13, 2017

Please refer to the individual commit messages.

Supersedes PRs #6272 and #8753; hopefully third time's the charm :-)

Edit: Based on this PR, we should also be able to finally address issue #5927 properly; see WIP patches available here.

Edit2: Rebased to fix merge conflicts, no code changes were made.

@mozilla mozilla deleted a comment from pdfjsbot Aug 27, 2017
@mozilla mozilla deleted a comment from pdfjsbot Aug 27, 2017
@timvandermeij
Copy link
Contributor

/botio-linux preview

@timvandermeij
Copy link
Contributor

/botio unittest

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew, that was quite a bit of code to review :)

In general these patches look really good! As is to be expected with such a large pull request, I have some questions and suggestions for improvement, but most of them are minor. After these comments have been adressed, I'll do more testing (initial testing did not yield any unexpected behavior yet) and then we can ship it. With approximately one month to go before a new Firefox release, this should give us enough time in case we need to tweak things.

web/ui_utils.js Outdated
* @property {Object} target - The event target, can for example be:
* `window`, `document`, a DOM element, or an {EventBus} instance.
* @property {string} name - The name of the event.
* @property {number} delay - The delay, in milliseconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following perhaps be a bit more descriptive to indicate what the delay is for?

The delay, in milliseconds, after which a timeout occurs.

}, done.fail);
});


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove duplicate empty line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; also we might want to change the ESLint rule no-multiple-empty-lines to only allow one empty to avoid these issues.

if (typeof first !== typeof second) {
return false;
}
if (first instanceof Array) {
Copy link
Contributor

@timvandermeij timvandermeij Aug 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it no problem if second is an array?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that the typeof checks just above should handle that, but considering that typeof [] === 'object' you're obviously correct. There should be a check for second too, really good call!

if (first instanceof Array) {
return false;
}
if (typeof first === 'object' && first !== null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we swap these conditions? It turns out that typeof null === 'object' evaluates to true, which is not what I would expect. By using first !== null && typeof first === 'object' you avoid this confusion a bit by stopping early if first === null. Moreover, I find it a bit more readable.

Don't we also need to check these conditions for second? If e.g., second is not an object, using Object.keys below fails I think.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we swap these conditions?

Yes, there's no particular reason for the ordering, so if you find that more readable that can be done.

Don't we also need to check these conditions for second?

Oh, we should probably check at least second !== null here too. That should be sufficient, since the initial typeof first !== typeof second check ought to handle the rest.

* Push an internal destination to the browser history.
* @param {PushParameters}
*/
push({ namedDest, explicitDest, pageNumber, }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature does not match the interface. The interface most likely needs to get the options parameter.

_boundEvents.popState = this._popState.bind(this);
_boundEvents.pageHide = (evt) => {
// Attempt to push the `this._position` into the browser history when
// navigating away from the document, if the history is currently empty.
Copy link
Contributor

@timvandermeij timvandermeij Aug 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only if the history is empty? If there are history entries, I change the position and navigate away from the document, I would expect the last position to be retained when I navigate back.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the time the pagehide event fires, it's too late to make (most) modifications of the history. That is also the whole reason for the existence of commit 281070b, please refer to the big comment starting at 281070b#diff-3b4ebbe223f93c9123a754b4845844e7R380.

At this point in time, since the viewer is already being unloaded, it's impossible to push a new entry to the browser history. What would happen instead here, is that the current browser history entry would be overwritten (which is definitely not what you want here).
The only case where that doesn't matter is if the history is empty, and for all other cases we'll instead rely on the temporary position instead here.

I'll try to amend this comment a bit, to make this clearer.

if (UPDATE_VIEWAREA_TIMEOUT > 0) {
// When closing the browser, a 'pagehide' event will be dispatched which
// *should* allow us to push the current position to the browser history.
// In practice, it seems that the event is arriving to late in order for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: too late


if (this._destination.temporary) {
// Always replace a previous *temporary* position, before moving back.
this._pushOrReplaceState(this._position);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we force-replace here, since it always needs to happen?

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing so would break things very badly, since it would overwrite the current history entry.

Since we get where when the user goes back in the history, the current browser history entry (window.history.state) now points to the new destination we should be navigating to.
By force-replacing in this case, you'd get stuck in the current position in the document, with all sorts of breakage as a result (history non-functional and so on).

Note also how we're not force-replacing in the case below either, for the same reasons.

This is quite complicated to wrap your head around at first (don't ask me how I know :-), so if you'd like I can try and explain it in more detail!


// Ensure that we don't miss a 'presentationmodechanged' event, by
// registering the listener immediately.
// Ensure that we don't miss either a 'presentationmodechanged' or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: or -> or a

// To prevent `this._tryPushCurrentPosition()` from effectively being
// reduced to a no-op in this case, we will assume that the position
// *did* in fact change if the 'updateviewarea' event was dispatched
// more than `POSITION_UPDATED_THRESHOLD` times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what this means or what this commit aims to achieve. Why is it bad if _tryPushCurrentPosition is reduced to a no-op if the hash is invalid, and where does the heuristic come from? Could you elaborate a bit on what problem this commit solves and how it works (approximately)?

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider loading the following document http://ftp.acc.umu.se/mirror/CTAN/info/lshort/english/lshort.pdf#pagemode=thumbs
In this case, the hash is defined, but it will obviously not cause a destination to be navigated to (which would set the page of its browser history entry). Furthermore, parseCurrentHash can obviously not extract a page either; #8775 (comment) might be somewhat relevant here too.

Given 1f41a59#diff-3b4ebbe223f93c9123a754b4845844e7R288, tryPushCurrentPosition won't ever update the browser history with the current position (unless push was called at some point). This could thus mean that despite commit 281070b, the browser history might not accurately capture the current position when the viewer is navigated away from or the browser closed.

Hence this heuristic, that attempts to "unlock" the adding of entries to the browser history when the position has updated "enough" times that it should be safe to assume that the position has in fact changed.
You could argue that this is a bit magical, especially since the value of the constant was basically pulled out of a hat, but it's my attempt to avoid permanently disabling certain parts of the implementation in e.g. the edge-case outlined above.
Note that in most cases, this heuristic won't even be necessary, but I figured that it shouldn't hurt to try and avoid this situation.

Does this help, if not ask again!

@Snuffleupagus
Copy link
Collaborator Author

Phew, that was quite a bit of code to review :)

Thank you for having the persistence, despite the size[1], to wade through all of it!!!

As is to be expected with such a large pull request, I have some questions and suggestions for improvement, but most of them are minor.

There's a number of good questions asked and I've tried my best to answer them[2], so please check if you're satisfied with the responses or if you think further clarifications are necessary!

In general, did you find that the existing comments helped, or were they to verbose?

I'll do more testing (initial testing did not yield any unexpected behavior yet)

That's very reassuring to hear, since the whole point of this re-write was to make the code more robust (and of course more maintainable too).

Again, thanks for taking the time to do such a thorough review :-)


[1] I really tried to break this up into somewhat smaller commits, but unfortunately that didn't work as well as I would have liked :-(

[2] It's sometimes quite difficult to know what parts will be confusing or difficult to understand for someone else, considering that you've got a lot of context that the reviewer might not have.

@timvandermeij
Copy link
Contributor

Thank you for the clear responses! In general the comments really helped to understand the code, in particular the edge cases you're addressing, so I'm really glad that it's documented so well! You're right that having the context is key here, so a review is a good way to find out which parts may need further clarification.

The responses look good so far. For some of them I'll need some more time to compare them with the code and to really understand, which I'll do this week including the testing, so we can hopefully merge it at the end of the week if all goes well. Feel free to amend the commits in the meantime if you want; I'll check each comment above in the amended version after that.

The current implementation of `PDFHistory` contains a number of smaller bugs, which are *very* difficult to address without breaking other parts of its code.
Possibly the main issue with the current implementation, is that I wrote it quite some time ago, and at the time my understanding of the various edge-cases the code has to deal with was quite limited.
Currently `PDFHistory` may, despite most of those cases being fixed, in certain edge-cases lock-up the browser history, essentially preventing the user from navigating back/forward.

Hence rather than trying to iterate on `PDFHistory` to make it better, the only viable approach is unfortunately rip it out in its entirety and re-write it from scratch.
…an event or a timeout, whichever occurs first
This patch completely re-implements `PDFHistory` to get rid of various bugs currently present, and to hopefully make maintenance slightly easier. Most of the interface is similar to the existing one, but it should be somewhat simplified.

The new implementation should be more robust against failure, compared to the old one. Previously, it was too easy to end up in a state which basically caused the browser history to lock-up, preventing the user from navigating back/forward. (In the new implementation, the browser history should not be updated rather than breaking if things go wrong.)

Given that the code has to deal with various edge-cases, it's still not as simple as I would have liked, but it should now be somewhat easier to deal with.
The main source of complication in the code is actually that we allow the user to change the hash of a already loaded document (we'll no longer try to navigate back-and-forth in this case, since the next commit contains a workaround).

In the new code, there's also *a lot* more comments (perhaps too many?) to attempt to explain the logic. This is something that the old implementation was serverly lacking, which is a one of the reasons why it was so difficult to maintain.

One particular thing to note is that the new code uses the `pagehide` event rather than `beforeunload`, since the latter seems to be a bad idea based on https://bugzilla.mozilla.org/show_bug.cgi?id=1336763.
…viewer is idle

This patch attempts to address an issue in the old `PDFHistory` implementation, where the current position wouldn't be correctly saved when the browser was closed.
In theory this *should* already be working, however as the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=1153393 showed, it seems that both `pagehide` and `beforeunload` arrive to late to successfully update the history during closing.

Hence a timeout is used to *temporarily* add the current position to the browser history when the viewer is idle.
Note that we need to take care not to update the browser history too often, since that would render the viewer more or unusable. Furthermore, if the timeout is *too* long it may end up effectively disable this whole functionality.

The `UPDATE_VIEWAREA_TIMEOUT` constant is thus a heuristic value, which we may need to tweak taking the above into account.
…ing a no-op if the document hash contains an invalid/non-existent destination

By using the (heuristic) `POSITION_UPDATED_THRESHOLD` constant, we can ensure that the current document position will be added to the browser history when a sufficiently "large" number of `updateviewarea` events have been dispatched.
@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Sep 3, 2017

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/6b6437d30606f31/output.txt

@mozilla mozilla deleted a comment from pdfjsbot Sep 3, 2017
@mozilla mozilla deleted a comment from pdfjsbot Sep 3, 2017
@mozilla mozilla deleted a comment from pdfjsbot Sep 3, 2017
@mozilla mozilla deleted a comment from pdfjsbot Sep 3, 2017
@mozilla mozilla deleted a comment from pdfjsbot Sep 3, 2017
@mozilla mozilla deleted a comment from pdfjsbot Sep 3, 2017
@pdfjsbot
Copy link

pdfjsbot commented Sep 3, 2017

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/6b6437d30606f31/output.txt

Total script time: 2.35 mins

Published

@timvandermeij
Copy link
Contributor

/botio unittest

@pdfjsbot
Copy link

pdfjsbot commented Sep 3, 2017

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/c3e152ce1da4ff8/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 3, 2017

From: Bot.io (Windows)


Received

Command cmd_unittest from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/1038d17381090b9/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 3, 2017

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/c3e152ce1da4ff8/output.txt

Total script time: 2.51 mins

  • Unit Tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Sep 3, 2017

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/1038d17381090b9/output.txt

Total script time: 12.63 mins

  • Unit Tests: Passed

@timvandermeij timvandermeij merged commit 1c9af00 into mozilla:master Sep 3, 2017
@timvandermeij
Copy link
Contributor

Nice work! Looks really good and it's a massive improvement over the old implementation.

@Snuffleupagus Snuffleupagus deleted the rewrite-PDFHistory-2 branch September 4, 2017 07:59
@Snuffleupagus
Copy link
Collaborator Author

Nice work! Looks really good and it's a massive improvement over the old implementation.

Thanks so much for helping with getting this reviewed (with lots of good points/questions) and landed! :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants