A new list diffing algorithm #1274

Rich-Harris · 2018-03-23T22:55:05Z

1.58.0 introduced a new list diffing algorithm, intended to solve the performance problems in #588. The original PR (#1249) was incredibly fast, but unfortunately had incorrect behaviour in some cases (where sequences of blocks were moving together). I modified the algorithm, thinking I was fixing the bug, but in fact I was also undoing the reason it was so fast, and everything actually became slower than it was before.

Oops.

I first tried to implement the classic list diffing algorithm described in this Medium post but... it's too complicated for my liberal arts major brain. I had a hunch that there might be a simpler approach with the same performance characteristics, that would avoid ping-ponging from the front to the back of the list. And here it is, in pseudo-code:

let o = length of old_list
let n = length of new_list

let deltas = Map<key, delta> where key is present in both lists,
                             and delta is abs(change in key position)

let did_move = Map<key, boolean>
let will_move = Map<key, boolean>

let insertion_point = null (i.e. the end)

while (o > 0 and n > 0)
  let old_key = old_list[o - 1];
  let new_key = new_list[n - 1];

  if (old_key === new_key)
    o--
    n--
    insertion_point = new_key

  else if (old_key is not in new_list)
    remove(old_key)
    o--

  else if (new_key is not in old_list)
    insert(new_key, insertion_point)
    n--

  else
    if (did_move[old_key])
      o--

    else if (will_move[new_key])
      insert(new_key, insertion_point)
      insertion_point = new_key
      n--

    else if (deltas[new_key] > deltas[old_key])
      did_move[new_key] = true
      insert(new_key, insertion_point)
      insertion_point = new_key
      n--

    else
      will_move[old_key] = true
      o--

while (o-- > 0)
  let old_key = old_list[o]
  remove(old_key)

while (n-- > 0)
  let new_key = new_list[n]
  insert(new_key, insertion_point)
  insertion_point = new_key

It's likely that this isn't novel at all, and it's possible that it has negative performance characteristics in circumstances I haven't considered. But it seems to work pretty well.

The secret sauce is deltas. When changing ABCDE to EABCD and working backwards (which is slightly more convenient), there are two possibilities...

move D to the end, then C (which is now in front of the E) in front of the D, B in front of the C, A in front of the B, or
move the E to the front

...one of which is obviously better — the second is one move, the first is four moves. But if you codify the rule that on encountering different keys you should always move the old key rather than the new key, then going from ABCDE to BCDEA would take four moves instead of one.

deltas allows us to make the right decision in both cases. In the first example, E would have to move four spaces to get to its new home, whereas D would only have to move one. In other words, moving the old key (E) is 'worth' four moves. In the second example, E would only have to move one space, whereas A would have to move four — so we move the new key (A), not the old key (E).

In the row swapping case that motivated all this work (the js-framework-benchmark) test, this algorithm is as fast as #1249, but also works for those tricky edge cases.

Would be grateful if anyone out there who is more knowledgable about these sorts of things can sanity check me though...

TODO

tidy things up a little bit
double check all the benchmarks this time
remove all the linked list stuff, which is no longer necessary (store the array of keys instead)

codecov-io · 2018-03-23T23:01:51Z

Codecov Report

Merging #1274 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1274      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.02%     
==========================================
  Files         126      126              
  Lines        4548     4541       -7     
  Branches     1478     1478              
==========================================
- Hits         4179     4172       -7     
  Misses        153      153              
  Partials      216      216

Impacted Files	Coverage Δ
src/generators/nodes/EachBlock.ts	`97.95% <100%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc416a5...99afa99. Read the comment docs.

jacwright · 2018-03-24T01:07:34Z

I borrowed an algorithm from Google's observe-js polyfill and implemented it here https://github.com/chip-js/differences-js/blob/master/src/diff.js#L199. (The original is too enmeshed into their other code and had to be extracted, so my version is the easier one to try and use)

I really don't know what performance characteristics are being targeted so this may not be helpful at all, but I thought I ought to share in case it was.

thysultan · 2018-03-24T02:11:04Z

If it also handles both carousel directions .

1, 2, 3, 4, 5, 6 -> 2, 3, 4, 5, 6, 1 = (append 1)
2, 3, 4, 5, 6, 1 -> 1, 2, 3, 4, 5, 6 = (prepend 1)

1, 2, 3, 4, 5, 6 -> 2, 3, 4, 5, 6, 7 (remove 1, create append 7)
2, 3, 4, 5, 6, 7 -> 1, 2, 3, 4, 5, 6 (remove 7, create prepend 1)

Then it should handle the common cases.

Having some tests to ensure the resulting DOM operations taken match exceptions would help in preventing a regression in the future. ivi, domvm, dio, etc, do this to some reasonable extent since some DOM ops can impact the correctness of a ui's end state with regards to input and other stateful elements beyond the supposed html representation.

Rich-Harris · 2018-03-24T16:10:12Z

Thanks @jacwright. The observe-js approach looks like it probably has significantly worse performance in cases where there are many edits; it has a loop inside a loop in calcEditDifferences meaning that the comparison is (I think?) O(n^2) in the worst case, compared to O(n) for this algo (though they're probably similar when it comes to the actual DOM moves, which is the expensive part).

@thysultan yep, it handles those cases as described. You're right about regressions. Not sure how to test how many move operations it performs in a given case without adding code specifically for testing, but perhaps we need a more robust approach to benchmarking generally. Will freely admit that I generally skip that part, because running decent benchmarks takes forever.

Have tidied everything up here and double checked the benchmarks, so I'll go ahead and merge this.

btakita · 2018-03-25T00:33:32Z

Perhaps one way to test performance regressions is to add a wrapper to the JSDOM api which records DOM operations. Then the test can assert n DOM operations were performed.

Also, I thought I covered all of the cases in the PR. Apologies for what I missed. It does look like you have something that works well though.

Rich-Harris · 2018-03-26T10:33:05Z

No apologies necessary! It was a real corner case — I only spotted it because one of the 'random permute' cases happened to fail in CI; to reproduce it locally I had to increase the count from 100 to 1000. And one of the reasons I'm keen on extracting stuff like this out into shared helpers is that it's much easier to understand and change the code when it's written in a JavaScript file, rather than in a giant string with interpolate variable names etc — #588 stood open so long because I didn't have the stomach to wade into that unreadable mess, but you did!

Rich-Harris added 8 commits March 21, 2018 11:31

reinstate previous code from before i ballsed it up

5a7f7a0

minor edits

2923e6a

more minor edits

5349184

am close...

7c953a6

holy shit i think i did it

a3e91eb

all tests passing

4b2a01f

tidy up

5f8f213

tidy up

fb84d72

Rich-Harris added 5 commits March 24, 2018 09:43

remove some unused stuff

174975c

remove linked list stuff

f414509

remove unused argument

0672e7b

simplify a bit

105ab41

simplify

99afa99

Rich-Harris changed the title ~~[WIP] A new list diffing algorithm~~ A new list diffing algorithm Mar 24, 2018

Rich-Harris merged commit 89c0864 into master Mar 24, 2018

Rich-Harris deleted the fix-perf-regression branch March 24, 2018 16:10

This was referenced Mar 26, 2018

update svelte to 1.58.5 krausest/js-framework-benchmark#369

Merged

Wrap JSDOM to count DOM mutations #1276

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new list diffing algorithm #1274

A new list diffing algorithm #1274

Rich-Harris commented Mar 23, 2018 •

edited

Loading

codecov-io commented Mar 23, 2018 •

edited

Loading

jacwright commented Mar 24, 2018

thysultan commented Mar 24, 2018

Rich-Harris commented Mar 24, 2018

btakita commented Mar 25, 2018 •

edited

Loading

Rich-Harris commented Mar 26, 2018

A new list diffing algorithm #1274

A new list diffing algorithm #1274

Conversation

Rich-Harris commented Mar 23, 2018 • edited Loading

codecov-io commented Mar 23, 2018 • edited Loading

Codecov Report

jacwright commented Mar 24, 2018

thysultan commented Mar 24, 2018

Rich-Harris commented Mar 24, 2018

btakita commented Mar 25, 2018 • edited Loading

Rich-Harris commented Mar 26, 2018

Rich-Harris commented Mar 23, 2018 •

edited

Loading

codecov-io commented Mar 23, 2018 •

edited

Loading

btakita commented Mar 25, 2018 •

edited

Loading