Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Meyers diffing algorithm #85

Open
yoshuawuyts opened this issue Oct 10, 2017 · 31 comments
Open

implement Meyers diffing algorithm #85

yoshuawuyts opened this issue Oct 10, 2017 · 31 comments

Comments

@yoshuawuyts
Copy link
Member

So we have a slight diffing problem — not only are we fairly slow at updating, reordering nodes is also suboptimal. This means that some reorders can be surprising at the worst case, and in a worst case reorders simply take longer than they should.

In computer science this is known as "diffing", and is a rather well-studied problem. Probably the best known algo for this is the Mayers Diffing algorithm. It'd be neat if we could make use of this for nanomorph.

references

@yoshuawuyts
Copy link
Member Author

@graforlock did you get a chance to look at this?

@graforlock
Copy link
Member

graforlock commented Oct 20, 2017

Yeah @yoshuawuyts made some steps to extract it along with pre-optimisations that can be applied to nanomorph, but the newborn put me off the track :) Will probably progress & finalise it this following weekend if this attempt is successful and performant enough

@yoshuawuyts
Copy link
Member Author

@graforlock ohhhhh, congratulations! 🎉 🎉 🎉 🎉

No rush at all; great to hear you've been successful, and like hope you're having a good time! 😁

@blahah
Copy link

blahah commented Oct 22, 2017

Myers diffing for the dom is totally tractable. @graforlock looking forward to your implementation.

@graforlock
Copy link
Member

graforlock commented Oct 22, 2017

Problems might be with numbers of pre-optimisations and we need quite a bit of them, that will appear in the forthcoming performance tests. DOM node getters and setters should be used sparsely/wisely as they are quite slow in bulk (vs comparing vals/keys on virtual objects), and when comparing real DOM trees - this needs special attention as any excessive and reckless use of aforementioned slows the computation down.

@blahah
Copy link

blahah commented Oct 22, 2017

@graforlock I've been thinking and reading about the problem and I actually think Myers is unlikely to be the best approach. It's good for the general diff problem, but for a DOM tree there are specialised algorithms.

screen shot 2017-10-23 at 00 03 38

http://referaat.cs.utwente.nl/conference/3/paper/7144/change-detection-in-xml-trees-a-survey.pdf

The DOM is a tree with ordered siblings, and we want the operations:

  • insert
  • delete
  • update
  • move

So from the list above, the DiffXML might be worth looking at.

@graforlock
Copy link
Member

graforlock commented Oct 23, 2017

Myers is a groundwork for a lot of the (performant) other algorithms including some tree based diffs. Its also proven to do good work in many libraries that operate on trees. Yes, it is not specialised by default but there are specialised and well-optimized implementations at hand.

@kristoferjoseph
Copy link
Collaborator

I say try them all, but ship as iterations pass the test suite.
Good news is we can try out different approaches in branches now that we have a baseline and make incremental improvements.

@blahah
Copy link

blahah commented Oct 23, 2017

@graforlock sounds good to me 👌

@graforlock
Copy link
Member

graforlock commented Oct 25, 2017

@blahah or anyone, I struggle to find a whole lot of time lately, so I'm progressing slowly. Although I have good suites and understanding how to make it work well, we could pair up on that can't we?

I realise there isn't immediate rush but it would be cool if this was done anyhow.

@yoshuawuyts
Copy link
Member Author

@graforlock maybe if you could create a repo, people could like dig in and help out? I'd love for this to make some progress — there's not a lot of pressure, but still, the sooner we'd get a version working the better ✨

@blahah
Copy link

blahah commented Oct 27, 2017

yeah, a repo for a pure myers-diff-dom or something would be ideal, then we can collab in there?

@graforlock
Copy link
Member

yeah will do, a good idea

@yoshuawuyts
Copy link
Member Author

@graforlock yay! — looking forward to it! ✨

@graforlock
Copy link
Member

graforlock commented Oct 30, 2017

should do tomorrow if anything goes well, need to polish some edges and README-this-stuff for directions

@graforlock
Copy link
Member

graforlock commented Nov 1, 2017

So, heres the repo @yoshuawuyts @blahah @kristoferjoseph

It is essentiallypetit-dom algorithm ported/adapted from VDOM to a DOM diffing methodology. A lot of code is similar.

A lot of stuff is also just opinionated result of the port (like input updates etc), but the algorithm itself (all the diff prefix methods for example) is totally extractable to the separate module. Sorry its all in one file atm! It has to be changed definitely!

It may be that I wasted huge amount of mine and your time and its all pile of crap! Or not... 😄

Preliminary results give me a huge performance advantage over nanomorph (as you can see in example)

https://github.com/graforlock/diffing-implementation

@graforlock
Copy link
Member

graforlock commented Dec 4, 2017

So my thoughts are, after dealing with this stuff for some time.

Firstly, traversing the actual DOM tree for a tree diff is costly. Myers is super slow (with all the diff traceback it does by default) and won't perform well over certain operations cap. petit-dom is bypassing it by switching to a generic LCS algorithm, which is faster, but in nanomorph's case still too slow. Only makes performance difference with equal number of nodes, other than that, seems to have no edge over what nanomorph is currently doing.

There is a need for super performant and greedy LCS algortihm. I myself found a possible candidate but this is not that easy on the actual DOM tree without breaking the API. Proxy nodes is another thing to consider - their swap index has to be keyed in a reorder map given they would change their position in the DOM tree, remaining the same in the end. Not much of a walk in the park.

@jongacnik
Copy link
Member

hey all! popped up in my dash today so just dropping this in, in case it is at all relevant → https://github.com/WebReflection/domdiff

@graforlock
Copy link
Member

Looks really cool, how's performance of it ? Did you try to measure it?

@jongacnik
Copy link
Member

@graforlock not yet, just a quick glance over source. Have a feeling you, @blahah, or @yoshuawuyts would be able to discern a bit more than myself!

@soyuka
Copy link

soyuka commented Dec 10, 2017

Good discussion about diffing: snabbdom/snabbdom#348
Think it's worth mentioning here.

@WebReflection
Copy link

not sure I'm adding anything to this discussion but for what I can tell with my tests, the petit-dom algo seems to be the fastest but there are too many operations that non virtual DOM wouldn't need, and compared to snabbdom is quite bigger, in terms of final size ( domdiff is 524 bytes in gzip and 471 bytes via brotli, as example ).

domdiff also wouldn't care less about patches and it works linearly with a list of nodes, it's not suitable for nested nodes too, just lists of the same parent node.

@tunnckoCore
Copy link

tunnckoCore commented Feb 2, 2018

Heya! 🎉 Following this long time. And as i once mentioned, i once wanted to rewrite it from scratch - for learning, playing and minimizing the size. So code name was "minmorph" - removed the "./events" list, the remaining is almost the same with few optimizations. Actually not sure why we need to list all the events?

Long story short, back then i tested it against nanomorph's tests and only 1 tests was failing. Recently was touching it more and following the need for benchmarking, i got the @graforlock repo and put it there so we can benchmark. Also added the nanomorphs tests.

And so, the result is all of the normal tests are passing and only 8 of the "keyed" tests are failing!
The benchmark results are phenomenal :) I'm going to create a clean repo with everything, plus @graforlock's code and benchmark. Then i'll PR the implementation so we can discuss and continue on fixing these 8 tests, because i'm not so familiar with the keyed things :D

The tests are green (almost)
2018-02-02-02 34 40_1280x1024_scrot

minmorph benchmark - numbers are the same as the @graforlock's not finished code, which in turn fails pretty much tests.
2018-02-02-02 35 04_1280x1024_scrot

Nanomorph benchmark
2018-02-02-02 34 53_1280x1024_scrot

@tunnckoCore
Copy link

tunnckoCore commented Feb 7, 2018

So, that's it https://github.com/tunnckoCore/demo-minmorph

yarn install
yarn dev

And open http://localhost:5000. It uses parcel-bundler to run on the browser btoh the tests and benchmarks.
Tests are at the src/app/test.js and not in the test/ at the root. For trying the tests against nanomorph go to the file and pass nanomorph to the abstractMorph and abstractKeyed instead of minmorph.

One interesting keyed diffing https://github.com/joshrtay/key-diff (similar to domdiff).

@graforlock
Copy link
Member

graforlock commented Feb 7, 2018

Cool stuff, I am going to have a look.

I myself also experimented with "mostly" accurate LCS algorithm (Sift4), and honestly I don't think we need proper 100% accurate diffing algorithms with backtrace provided we have keyed results reordering sorted out properly. Reason being that the time for calculating such accuracy is expensive and outweights simple removal / addition operations.

For example, I did extract basic Myers in agnostic fashion here: https://github.com/graforlock/myers-algorithm and this was literally super slow (expensive).

Sift seemed to be blazing fast and I may put some more work into that for greater good.

@tunnckoCore
Copy link

tunnckoCore commented Jun 23, 2018

Heya all! Just for the record of my work. Last night i started playing again on the above mentioned repo demo-minmorph. It's now a lot more organized and cleaned (reorganized and apply DRY on tests for #87), and deps are updated.
Please check it out and try running it.

Tests are increased as number, failing tests are decreased to only 2 (5 assertions). Related to this is tunnckoCoreLabs/demo-minmorph#14 ("no recursion step in domdiff") and thoughts from @WebReflection.
Why tests are increased? They now are 724, because i switched to #104, now they are around 98 events (on chrome) instead of 30-35 hardcoded.

Only 8 of them are failing and are excluded from the suite

function excludeEvents(name) {
  // failing event tests for Minmorph
  // Nanomorph fails for another ones and some of these
  const isFailing = [
    'onresume',
    'onfreeze',
    'onreadystatechange',
    'onpointerlockchange',
    'onpointerlockerror',
    'onselectionchange',
    'onvisibilitychange',
    'onsecuritypolicyviolation',
  ].includes(name);

  return !isFailing;
}

Most of the keyed tests are passing, except 2 (handle non-id elements and nested without id) which i marked as TODO: PRs welcome for easier search.

Benchmarks (yarn bench) are against Nanomorph@6 beta using the @graforlock's benchmark from ... hm, from where i got it (some repo). It's not something so special, but it shows that my implementation using a bit changed domdiff is around 3-5 times faster than Nanomorph. I'm still not sure for that numbers, because of the failing tests.

Please share some feedback. I'll try to PR things one by one.

Cheers 🥂

@brucou
Copy link

brucou commented Jan 14, 2019

Hi guys,

what is the status on that?

@yoshuawuyts
Copy link
Member Author

@brucou hey! -- I've casually started working on a WASM version, but there's no ETA when this might be done.

@brucou
Copy link

brucou commented Jan 16, 2019 via email

@tunnckoCore
Copy link

@brucou the demo-minmorph is there, you can see performance and issues there ;] I didn't looked on it for a while.

@brucou
Copy link

brucou commented Jan 17, 2019

thanks, will have a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants