Improve ContainExactly matcher speed when elements obey transitivity #1325

bclayman-sq · 2021-10-06T22:58:39Z

This addresses issues #1006, #1161.

The current implementation for ContainExactly runs in O(n!). The crux of the problem is that some elements don't obey transitivity. As a result, knowing that sorting actual and expected doesn't result in a match doesn't guarantee that expected and actual don't match.

This PR makes improvements that ensure any test with comparable elements in actual and expected runs in O(n log n) time.

There are two main updates:

Update the core logic in ContainExactly to:

Attempt to sort actual and expected
If the sort succeeds, this tells us the objects are comparable. Use the result of the sort

This means that when actual and expected contain comparable elements, we can immediately return false when the sorted arrays don't match. This speeds up passing tests where actual and expected don't match from O(n!) to O(n log n).

Speed up determining extra and missing elements

Previously, the logic for determining extra and missing items between actual and expected relied on code in PairingsMaximizer that generated all possible element pairings. So even if you avoided comparing all possible pairings to determine if the matcher should return true, you would still incur that O(n!) cost when generating a failure message.

The code now calculates extra and missing in linear time. This speeds up failing tests from O(n!) to O(n log n).

More practically, this means that common use cases for ContainExactly will enjoy a massive speedup. Previously, users have examples where comparing arrays of 30 integers "never finishes." With this PR's update, comparing arrays of 10,000 integers runs in < 0.1s on my machine.

bclayman-sq · 2021-10-06T23:00:39Z

This is a small proof of concept to suggest a path forward for improving the performance of ContainExactly (and by extension, MatchArray) dramatically for common use cases. I'd love to hear what folks think!

bclayman-sq · 2021-10-08T15:16:58Z

@pirj Had to update a few things to pass CI for all ruby versions. If you think it's reasonable, I can try to put out a more comprehensive PR.

What do you think of this approach?

lib/rspec/matchers/built_in/contain_exactly.rb

pirj · 2021-10-08T21:00:57Z

Not to litter in contained threads, I'll just post my findings here. I have a limited experience with practical algorithm application, unfortunately.

https://en.wikipedia.org/wiki/Stable_marriage_problem

problem of finding a stable matching between two equally sized sets of elements given an ordering of preferences for each element

https://en.wikipedia.org/wiki/Lattice_of_stable_matchings

https://brilliant.org/wiki/matching/

group of candidates and a set of jobs, and each candidate is qualified for at least one of the jobs

https://brilliant.org/wiki/matching-algorithms/
https://en.wikipedia.org/wiki/Matching_(graph_theory)#Maximum-cardinality_matching

In any case, O(n!) seems to be ridiculously expensive. Building a lattice is O(N^2), and shaking it has a chance to be less computationally extensive, but I suppose will consume more memory.

Does the practical task boil down to finding the local minimum total of extra and missing elements?

bclayman-sq · 2021-10-09T02:28:30Z

👋 @pirj,

I'm gonna reply in-line just to keep a summary of where we're at on this PR. You had two recent bits of feedback that I've addressed below:

It would be ideal, though, not to expose transitive, as, frankly, I had to look it up to remind myself what it was.
And still, users (I do not assume stupidity, just human error) might use it wrong. Apart from it being yet another to keep in mind.

I think there are a few ways to address your concerns above:

1. Better naming: I'm not wedded to transitive. We could call it sortable, comparable, etc.
2. Good documentation: Explicit and detailed documentation will hopefully help users use it correctly.
3. Informative error messages when someone does misuse transitive: For example, if a user uses transitive with elements that aren't comparable, we can raise. That can happen here:

def safe_sort(array)
  array.sort
rescue Support::AllExceptionsExceptOnesWeMustNotRescue
  raise "Invalid use of `.transitive` with unsortable array #{array}" if @transitive
  array
end

Maybe we could calculate the missing and extra items separately

This sounds like a reasonable approach. It feels that for transitive we can improve pairing maximizer to be O(n) (on two sorted arrays, since sorting that was performed previously already).

Awesome, I think we're on the same page (and thanks for being so helpful in your responses).

I've just refactored my code to calculate extra and missing items in O(n) time when transitive is used. This addresses your earlier concern about generate_failure_message incurring O(n!) work by relying on PairingsMaximizer methods.

I think it'd be a really nice improvement to ContainExactly from O(n!) to O(n log n) when sorting doesn't result in a match and the elements are transitive. I'd be open to improving naming, updating documentation, and adding error-handling if that would get this across the finish line.

When you get a chance (you've already helped tons, thank you!), would you mind letting me know if you'd be open to merging this functionality into RSpec?

pirj · 2021-10-09T07:57:14Z

Just skimmed through your comments and I truly like the idea to detect if all items are transitive by making a check if the sort succeeded. Wondering if [be_positive, be_negative].sort blows up. It should for our purposes.
Will reply later to the rest. Thanks for taking this tricky task over!

…ents obey transitivity This is a proof of concept approach for addressing issue rspec#1161. The current implementation for ContainExactly runs in O(n!). In practice, it runs in O(n log n) when the elements are comparable and sorting result in a match. The crux of the problem is that some elements don't obey transitivity. As a result, knowing that sorting actual and expected doesn't result in a match *doesn't* guarantee that expected and actual don't match. This proof of concept provides a way for the user to indicate that the elements in a particular example's expected and actual obey transitivity. That looks like this: expect(a).to contain_exactly(*b).transitive And runs in O(n log n) time. More practically, this means that common use cases for contains_exactly will enjoy a massive speedup. Previously, users have examples where comparing arrays of 30 integers "never finishes." Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s on my machine.

lib/rspec/matchers/built_in/contain_exactly.rb

pirj · 2021-10-10T09:48:16Z

I dared to push some cosmetic changes, what do you think of such an auto-detection of transitivity?

spec/rspec/matchers/built_in/contain_exactly_spec.rb

lib/rspec/matchers/built_in/contain_exactly.rb

bclayman-sq · 2021-10-10T14:16:21Z

I dared to push some cosmetic changes, what do you think of such an auto-detection of transitivity?

@pirj,

Thanks for doing that! I think this is great and a clear improvement over .transitive; users will get the benefit without having to know about or remember anything about .transitive.

If you're happy with this, would you mind approving? If so, I'll update the PR title and message since it's no longer a proof of concept and doesn't use transitive and then I'll merge 😄

pirj · 2021-10-10T14:59:22Z

I'm all good with the change. The only thing I think needs addressing are specs - there is a very similar example with 10 numbers and Timeout check. It makes sense to combine or at least put those two closer to each other. @JonRowe @benoittgt wdyt?

bclayman-sq · 2021-10-10T16:42:39Z

I'm all good with the change. The only thing I think needs addressing are specs - there is a very similar example with 10 numbers and Timeout check. It makes sense to combine or at least put those two closer to each

@pirj Yeah, that makes a lot of sense. I just refactored the tests so that my speed tests are near the 10 number + Timeout check. Cleaned a few things up with shared examples and timeout_if_not_debugging too.

How does it look?

pirj

Perfect! It's a huge win for this use case.
Thanks a lot for the contribution!

genehsu · 2021-10-25T03:46:22Z

Here's a different idea for the implementation.

What if in the pairings_maximizer method you pre-calculated pairs of comparable things between the expected and actual arrays? Then the amount of work the PairingsMaximizer class will normally be small because most values will be matched reciprocally, and only leftover values with matchers or other non-comparable items will need to be paired.

This implementation might be considered more straightforward because it doesn't introduce a new branches to the code flow, but optimizes the existing implementation. Here's a prototype: #1328

@genehsu

Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)

@genehsu

Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)

@genehsu

Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)

bclayman-sq · 2021-10-29T18:40:58Z

Closing in favor of #1333

bclayman-sq force-pushed the bclayman/improve-contains-exactly-speed branch 4 times, most recently from c00b44c to 57ccbff Compare October 7, 2021 14:55

pirj reviewed Oct 8, 2021

View reviewed changes

lib/rspec/matchers/built_in/contain_exactly.rb Show resolved Hide resolved

lib/rspec/matchers/built_in/contain_exactly.rb Outdated Show resolved Hide resolved

bclayman-sq force-pushed the bclayman/improve-contains-exactly-speed branch 3 times, most recently from b78dd66 to 42dda47 Compare October 8, 2021 23:46

bclayman-sq force-pushed the bclayman/improve-contains-exactly-speed branch 3 times, most recently from 710aee6 to 5cc1ea7 Compare October 9, 2021 21:24

bclayman-sq force-pushed the bclayman/improve-contains-exactly-speed branch from 5cc1ea7 to 0dae38c Compare October 9, 2021 23:06

fixup! Split statement

3bd66a7

pirj reviewed Oct 10, 2021

View reviewed changes

lib/rspec/matchers/built_in/contain_exactly.rb Show resolved Hide resolved

pirj added 2 commits October 10, 2021 12:39

fixup! Auto-detect transitivity

fefb6da

fixup! Fix fast calculate extra/missing

6fd105e

pirj reviewed Oct 10, 2021

View reviewed changes

spec/rspec/matchers/built_in/contain_exactly_spec.rb Outdated Show resolved Hide resolved

bclayman-sq commented Oct 10, 2021

View reviewed changes

lib/rspec/matchers/built_in/contain_exactly.rb Show resolved Hide resolved

bclayman-sq changed the title ~~Proof of concept for improving ContainExactly matcher speed when elements obey transitivity~~ Improve ContainExactly matcher speed when elements obey transitivity Oct 10, 2021

Group tests verifying speed together; use timeout_if_not_debugging

f0dee47

pirj approved these changes Oct 10, 2021

View reviewed changes

pirj requested review from benoittgt and JonRowe October 10, 2021 17:41

genehsu mentioned this pull request Oct 25, 2021

Speed up the ContainExactly matcher #1328

Closed

bclayman-sq mentioned this pull request Oct 29, 2021

Speed up the ContainExactly matcher #1333

Open

bclayman-sq closed this Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ContainExactly matcher speed when elements obey transitivity #1325

Improve ContainExactly matcher speed when elements obey transitivity #1325

bclayman-sq commented Oct 6, 2021 •

edited

Loading

bclayman-sq commented Oct 6, 2021

bclayman-sq commented Oct 8, 2021

pirj commented Oct 8, 2021

bclayman-sq commented Oct 9, 2021 •

edited

Loading

pirj commented Oct 9, 2021

pirj commented Oct 10, 2021

bclayman-sq commented Oct 10, 2021 •

edited

Loading

pirj commented Oct 10, 2021 via email •

edited

Loading

bclayman-sq commented Oct 10, 2021 •

edited

Loading

pirj left a comment

genehsu commented Oct 25, 2021

bclayman-sq commented Oct 29, 2021

Improve ContainExactly matcher speed when elements obey transitivity #1325

Improve ContainExactly matcher speed when elements obey transitivity #1325

Conversation

bclayman-sq commented Oct 6, 2021 • edited Loading

bclayman-sq commented Oct 6, 2021

bclayman-sq commented Oct 8, 2021

pirj commented Oct 8, 2021

bclayman-sq commented Oct 9, 2021 • edited Loading

pirj commented Oct 9, 2021

pirj commented Oct 10, 2021

bclayman-sq commented Oct 10, 2021 • edited Loading

pirj commented Oct 10, 2021 via email • edited Loading

bclayman-sq commented Oct 10, 2021 • edited Loading

pirj left a comment

Choose a reason for hiding this comment

genehsu commented Oct 25, 2021

bclayman-sq commented Oct 29, 2021

bclayman-sq commented Oct 6, 2021 •

edited

Loading

bclayman-sq commented Oct 9, 2021 •

edited

Loading

bclayman-sq commented Oct 10, 2021 •

edited

Loading

pirj commented Oct 10, 2021 via email •

edited

Loading

bclayman-sq commented Oct 10, 2021 •

edited

Loading