unsorted method tables #35983

vtjnash · 2020-05-22T02:33:52Z

This experiment tries to see what happens if we completely un-sort the method tables. Turns out we can actually still do surprisingly well for performance without those caches (lookup costs seem to range from 2-10x slower across a small sample of queries tested). This lets us reconsider which types of caches and trees we want to use to store this data.

[Ignore the commit history here, unless you really want to see all of my mistakes in handling a non-transitive sort. You might find them informative, if you wonder about the final algorithm I've landed upon here currently. But I imagine they'd be pretty hard to comprehend on their own.]

vtjnash · 2020-06-25T22:57:45Z

Most recent interesting commit history (before squashing) is ba39948. But I'm pretty happy with this now. Performance is harder to directly measure, since we expect most common cases now to hit the dispatch-tuple-types cache, but (given that) I think this is in a stable state now to review/merge.

JeffBezanson · 2020-06-26T00:13:09Z

We should look at the load times of various packages, and some other latencies. I'm building it now and the first thing that jumped out at me is loading LinearAlgebra during bootstrap is significantly faster --- I imagine since it defines a large number of methods with complex signatures. Base itself is a hair slower. Anyway we should give this a thorough benchmarking.

JeffBezanson · 2020-06-26T02:23:22Z

This has come a long way and is now pretty close, but I still see some regressions vs. master.
master:

julia> @time using Plots
  8.111273 seconds (8.30 M allocations: 490.805 MiB, 4.19% gc time)

julia> @time display(plot(rand(10)))
 11.280785 seconds (9.40 M allocations: 524.971 MiB, 1.35% gc time)

PR:

julia> @time using Plots
 12.068010 seconds (13.65 M allocations: 788.507 MiB, 3.23% gc time)

julia> @time display(plot(rand(10)))
 12.045214 seconds (10.56 M allocations: 590.858 MiB, 3.34% gc time)

vtjnash · 2020-06-26T15:45:42Z

Any timing without #36436 probably isn't too useful, but yes, Plots includes hits the harder cases quite heavily (far too much compilation due to using Artifacts and too many difficult backedges from a large precompile file), and I haven't worked as much on optimizing those yet. I'm surprised by how slow your computer is though—were you also using FORCE_ASSERTIONS=1? as that also would potentially invalidate attempted measurements. By comparison, ApproxFun load time I see amazingly go from 28s to 12s!

JeffBezanson · 2020-06-26T16:08:05Z

My computer is just really old. I also see the 2x speedup in ApproxFun. Makes sense that different cases will have different perf characteristics here.

vtjnash · 2020-06-26T17:58:17Z

Curious also what you see if you apply #35384 too. There's some ordering issues if we were to try to land that directly on master now which could cause some rare correctness issues (that's solved here, by simply not relying on ordering).

This PR (with #36436):

$ time ./julia -e '@time using Plots; @time display(@time plot(1:10))'
  5.386617 seconds (9.16 M allocations: 574.738 MiB, 2.18% gc time)
  3.231135 seconds (7.88 M allocations: 443.955 MiB, 7.36% gc time)
  8.201448 seconds (14.35 M allocations: 804.838 MiB, 3.84% gc time)

real	0m14.062s
user	0m12.790s
sys	0m0.387s

And then adding in #35384 also:

time ./julia -e '@time using Plots; @time display(@time plot(1:10))'
  4.436754 seconds (9.50 M allocations: 592.946 MiB, 2.44% gc time)
  2.962513 seconds (7.24 M allocations: 408.276 MiB, 8.35% gc time)
  6.658311 seconds (13.65 M allocations: 766.289 MiB, 4.79% gc time)

real	0m11.445s
user	0m11.367s
sys	0m0.354s

(note that about 2 seconds there—or almost half of that load time—appears in the Profile inside __init__ methods, aka Pkg.Artifacts, so there's nothing I can do about that)

JeffBezanson · 2020-07-01T19:31:33Z

src/gf.c

-            env.match.env = search.env;
-            env.match.issubty = 1;
-            env.match.fptr(ml, &env.match); // also matches all ambig and prior, as needed
+    jl_typemap_intersection_visitor(defs, offs, &env.match);


It would be good to figure out some sort of limit on this --- sometimes it returns a large number of items during inference, and we're unlikely to cut it down to 3. For example ==(::Any, ::Any). Maybe a larger arbitrary cutoff? Or maybe count how many methods had dispatchtuple signatures, since we know those are all disjoint, and early-out based on that?

Most efficient would perhaps be to change lim just to mean "number of intersecting signatures". I was slightly surprised I managed to preserve (and refine) the original meaning here. There's an early-exit in the first O(n^2) loop, which helps considerably for that case. Counting dispatchtuple signatures seems like a simple addition to that.

julia> @btime Base._methods_by_ftype(Tuple{typeof(==), Any, Any}, -1, Base.get_world_counter()) 76.585 μs (194 allocations: 12.88 KiB) # master 17.504 ms (46063 allocations: 2.56 MiB) # now (258d04e8bfdb8189fd61199bccd7641650fec07b) 17.386 ms (46063 allocations: 2.56 MiB) # with earlier limit from counting dispatch-tupletypes matches (ec8961c77614b80d21ee51d6353769c9ae57d899) julia> @btime Base._methods_by_ftype(Tuple{typeof(==), Any, Any}, 4, Base.get_world_counter()) 3.824 μs (8 allocations: 528 bytes) # master 280.331 μs (858 allocations: 76.42 KiB) # PR 880.929 ns (9 allocations: 624 bytes) # with limit

Rather than stopping at the first subtyping match, this scans the entire table of methods for intersections for every lookup. It then filters the results to remove ambiguities and sort for specificity order. Also exports the has_ambiguity computation from ml-matches, as it's cheap here to provide (already computed), and required by Core.Compiler (and also more correct than what the optimizer had been doing, though only on extreme hypothetical edge cases).

We know that dispatch tuple signatures are always disjoint and most-specific over their intersection, so they are always going to be included in the result. We can use that knowledge to abort the search quickly as we discover that there will be many specific matches.

Fixes timholy/Revise.jl#515 Changed in JuliaLang/julia#35983

vtjnash requested a review from JeffBezanson May 22, 2020 02:33

vtjnash force-pushed the jn/gf-lazy-sort branch 3 times, most recently from 73fab73 to ba39948 Compare June 25, 2020 00:54

vtjnash mentioned this pull request Jun 25, 2020

avoid invalidations when the overlap is ambiguous #35877

Closed

vtjnash marked this pull request as ready for review June 25, 2020 15:29

vtjnash force-pushed the jn/gf-lazy-sort branch from ba39948 to 32ddbdf Compare June 25, 2020 15:41

vtjnash force-pushed the jn/gf-lazy-sort branch 2 times, most recently from 227fd8f to 4c75c70 Compare July 1, 2020 19:04

JeffBezanson reviewed Jul 1, 2020

View reviewed changes

JeffBezanson approved these changes Jul 1, 2020

View reviewed changes

vtjnash added 2 commits July 1, 2020 22:16

vtjnash force-pushed the jn/gf-lazy-sort branch from 4c75c70 to ec8961c Compare July 2, 2020 02:35

vtjnash merged commit df8f8ae into master Jul 7, 2020

vtjnash deleted the jn/gf-lazy-sort branch July 7, 2020 20:03

timholy added a commit to JuliaDebug/JuliaInterpreter.jl that referenced this pull request Jul 8, 2020

Check for Method return from jl_gf_invoke_lookup

6fe6760

Fixes timholy/Revise.jl#515 Changed in JuliaLang/julia#35983

timholy mentioned this pull request Jul 8, 2020

Fix a couple of new breakages on nightly JuliaDebug/JuliaInterpreter.jl#416

Merged

vtjnash mentioned this pull request Jul 13, 2020

CategoricalArrays reveals possible ml_matches sorting bug #35187

Closed

JeffBezanson added the compiler:latency Compiler latency label Jul 30, 2020

aviatesk mentioned this pull request Oct 16, 2020

methods seems to get 100x slower than in v1.5 #38059

Closed

vtjnash mentioned this pull request Oct 16, 2020

Evaluating mul! in REPL is slow #36860

Closed

vtjnash mentioned this pull request Nov 5, 2020

Performance of value type construction #36384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unsorted method tables #35983

unsorted method tables #35983

vtjnash commented May 22, 2020 •

edited

Loading

vtjnash commented Jun 25, 2020

JeffBezanson commented Jun 26, 2020

JeffBezanson commented Jun 26, 2020

vtjnash commented Jun 26, 2020

JeffBezanson commented Jun 26, 2020

vtjnash commented Jun 26, 2020

JeffBezanson Jul 1, 2020

vtjnash Jul 2, 2020

unsorted method tables #35983

unsorted method tables #35983

Conversation

vtjnash commented May 22, 2020 • edited Loading

vtjnash commented Jun 25, 2020

JeffBezanson commented Jun 26, 2020

JeffBezanson commented Jun 26, 2020

vtjnash commented Jun 26, 2020

JeffBezanson commented Jun 26, 2020

vtjnash commented Jun 26, 2020

JeffBezanson Jul 1, 2020

Choose a reason for hiding this comment

vtjnash Jul 2, 2020

Choose a reason for hiding this comment

vtjnash commented May 22, 2020 •

edited

Loading