Improve function performance for an increasing number of stops #31

mourner · 2016-12-14T21:53:13Z

As noted by @kronick, evaluation of functions with the regards to the number of stops is more expensive that it could be — we're looping through all stops for each value. Instead:

categorical functions could use a hash to improve from O(n) to O(1)
interval and exponential functions can use the fact that stops are supposed to be sorted —using binary search to improve from O(n) to O(log N). I used a similar trick to optimize feature-filter.

The text was updated successfully, but these errors were encountered:

kronick · 2016-12-16T01:22:05Z

I've started work on this issue with a binary search on interval and exponential in this branch: https://github.com/mapbox/mapbox-gl-function/blob/optimized-search/index.js#L118 .

Searching a hash mapcategorical is next up and looks like less of an algorithm question and more of an architectural issue of where to store the hash map.

kronick · 2016-12-16T02:08:20Z

Initial results:

master:

>>> Evaluating exponential  function for 100000 iterations with 10000 stops
Only include values within the domain of stops:
Time: 1398.098ms
Include values outside the domain of stops:
Time: 1387.106ms


>>> Evaluating exponential  function for 100000 iterations with 100 stops
Only include values within the domain of stops:
Time: 26.592ms
Include values outside the domain of stops:
Time: 21.098ms

>>> Evaluating exponential  function for 1000000 iterations with 10 stops
Only include values within the domain of stops:
Time: 132.602ms
Include values outside the domain of stops:
Time: 74.718ms

optimized-search:

>>> Evaluating exponential  function for 100000 iterations with 10000 stops
Only include values within the domain of stops:
Time: 51.151ms
Include values outside the domain of stops:
Time: 25.580ms


>>> Evaluating exponential  function for 100000 iterations with 100 stops
Only include values within the domain of stops:
Time: 26.645ms
Include values outside the domain of stops:
Time: 11.884ms

>>> Evaluating exponential  function for 1000000 iterations with 10 stops
Only include values within the domain of stops:
Time: 156.097ms
Include values outside the domain of stops:
Time: 80.232ms

So performance is slightly slower with 10 stops, but significantly faster with 10000. If the slow-for-10-stops case is a concern, my binary search code could probably use more optimization or we could have a conditional for which search method to apply depending on number of stops.

I pushed a profile.js to the tests directory that shows how I'm doing this timing, but please let me know if there is a more appropriate place for that.

kronick · 2016-12-16T03:06:57Z

Categorical functions are now in the optimized-search branch. Profile results are:

optimized-search:

>>> Evaluating categorical function for 10000 iterations with 10000 stops
Using strings as categories:
Time: 1.976ms
Using values not included in string  categories:
Time: 2.220ms
Using integers as categories:
Time: 4.148ms
Using values not included in integer categories:
Time: 1.310ms


>>> Evaluating categorical function for 10000 iterations with 100 stops
Using strings as categories:
Time: 0.987ms
Using values not included in string  categories:
Time: 1.750ms
Using integers as categories:
Time: 3.551ms
Using values not included in integer categories:
Time: 1.020ms


>>> Evaluating categorical function for 100000 iterations with 10 stops
Using strings as categories:
Time: 9.537ms
Using values not included in string  categories:
Time: 4.704ms
Using integers as categories:
Time: 3.324ms
Using values not included in integer categories:
Time: 4.091ms


>>> Evaluating categorical function for 10000 iterations with 100 stops
Using strings as categories:
Time: 0.420ms
Using values not included in string  categories:
Time: 0.468ms
Using integers as categories:
Time: 0.324ms
Using values not included in integer categories:
Time: 0.413ms

master:

>>> Evaluating categorical function for 10000 iterations with 10000 stops
Using strings as categories:
Time: 325.803ms
Using values not included in string  categories:
Time: 389.536ms
Using integers as categories:
Time: 1284.433ms
Using values not included in integer categories:
Time: 1825.276ms


>>> Evaluating categorical function for 10000 iterations with 100 stops
Using strings as categories:
Time: 18.475ms
Using values not included in string  categories:
Time: 24.328ms
Using integers as categories:
Time: 19.700ms
Using values not included in integer categories:
Time: 27.892ms


>>> Evaluating categorical function for 100000 iterations with 10 stops
Using strings as categories:
Time: 24.741ms
Using values not included in string  categories:
Time: 29.082ms
Using integers as categories:
Time: 18.549ms
Using values not included in integer categories:
Time: 25.753ms


>>> Evaluating categorical function for 10000 iterations with 100 stops
Using strings as categories:
Time: 16.612ms
Using values not included in string  categories:
Time: 22.535ms
Using integers as categories:
Time: 12.805ms
Using values not included in integer categories:
Time: 20.307ms

So optimized-search is faster across the board. I can't explain the variability in times for different types of categories (string vs integer) and it seems to change depending on the order I run the tests, so my guess is it could be some internal memory optimization stuff going on but I would welcome other opinions if that's a concern.

mourner · 2016-12-16T10:14:46Z

@kronick fantastic results! This is a huge improvement. The case of a small number of stops is a concern because most functions in our default styles are like that, but I think we can follow-up with another PR for that.

kronick · 2016-12-17T01:21:16Z

I made a quick conditional-search branch that chooses which search function to use depending on the number of stops, hard-coded to switch over at 100. It feels a little hacky but maybe someone wants to build on it if the small number of stops case is an ongoing issue.

The benchmarks are as you might expect-- same as the old master for < 100 stops, same as optimized-search for >= 100 stops on exponential and interval searches. The conditional check happens once when the function is created, not every time it is evaluated.

kronick · 2016-12-20T18:19:20Z

Currently investigating my code as the source of this bug: mapbox/mapbox-gl-js#3838

The first thing I notice is that the following expanded tests fail on input values 680, 681, 682, 690, but not on 7300, 7301, 7302, 7320 (note that I'm exporting interpolateNumber() directly from index.js, to just isolate the index binary search code):

     t.test('many elements', function(t) {
          var f = MapboxGLFunction({
            type: 'exponential',
            "stops": [
            [2, 100],
            [55, 200],
            [132, 300],
            [607, 400],
            [1287, 500],
            [1985, 600],
            [2650, 700],
            [3299, 800],
            [3995, 900],
            [4927, 1000],
            [7147, 10000],
            [10028, 100000],
            [12889, 1000000],
            [40000, 10000000]
            ]
          });

          t.equal(f(2), 100);
          t.equal(f(2), interpolateNumber(2, 1, 2, 55, 100, 200));
          t.equal(f(20), interpolateNumber(20, 1, 2, 55, 100, 200));
          t.equal(f(607), 400);
          t.equal(f(680), interpolateNumber(680, 1, 607, 1287, 400, 500));
          t.equal(f(681), interpolateNumber(681, 1, 607, 1287, 400, 500));
          t.equal(f(682), interpolateNumber(682, 1, 607, 1287, 400, 500));
          t.equal(f(690), interpolateNumber(690, 1, 607, 1287, 400, 500));
          t.equal(f(4927), 1000); //86
          t.equal(f(7300), interpolateNumber(7300, 1, 7147, 10028, 10000, 100000));
          t.equal(f(7301), interpolateNumber(7301, 1, 7147, 10028, 10000, 100000));
          t.equal(f(7302), interpolateNumber(7302, 1, 7147, 10028, 10000, 100000));
          t.equal(f(7320), interpolateNumber(7320, 1, 7147, 10028, 10000, 100000));
          t.equal(f(10000), interpolateNumber(10000, 1, 7147, 10028, 10000, 100000));
          t.equal(f(20000), interpolateNumber(20000, 1, 12889, 40000, 1000000, 10000000));
          t.equal(f(40000), 10000000);

          t.end();
        });

Results are:

  not ok 85 should be equal
    ---
      operator: equal
      expected: 410.7352941176471
      actual:   415.3684210526316
      at: Test.<anonymous> (/Users/kronick/Documents/repos/mapbox-gl-function/test/test.js:172:17)
    ...
  not ok 86 should be equal
    ---
      operator: equal
      expected: 410.88235294117646
      actual:   415.57894736842104
      at: Test.<anonymous> (/Users/kronick/Documents/repos/mapbox-gl-function/test/test.js:173:17)
    ...
  not ok 87 should be equal
    ---
      operator: equal
      expected: 411.02941176470586
      actual:   415.7894736842105
      at: Test.<anonymous> (/Users/kronick/Documents/repos/mapbox-gl-function/test/test.js:174:17)
    ...
  not ok 88 should be equal
    ---
      operator: equal
      expected: 412.2058823529412
      actual:   417.4736842105263
      at: Test.<anonymous> (/Users/kronick/Documents/repos/mapbox-gl-function/test/test.js:175:17)
    ...

It looks like the binary search is returning off-by-one index values sometimes and therefore interpolating between stops lower than it should. This is no good! I'll try to figure out what's wrong with more complete tests.

lucaswoj · 2016-12-21T18:40:38Z

👉 mapbox/mapbox-gl-style-spec#629

mourner self-assigned this Dec 14, 2016

mourner mentioned this issue Dec 14, 2016

Data-Driven Styling Slow Compared to Filters mapbox/mapbox-gl-js#3805

Closed

mourner added the enhancement label Dec 14, 2016

kronick self-assigned this Dec 15, 2016

kronick mentioned this issue Dec 16, 2016

Optimized search #32

Merged

This was referenced Dec 21, 2016

Improve function performance for an increasing number of stops mapbox/mapbox-gl-style-spec#628

Closed

Improve function performance with a small # of stops mapbox/mapbox-gl-style-spec#629

Closed

lucaswoj closed this as completed Dec 21, 2016

lucaswoj mentioned this issue Feb 1, 2017

Improve function performance with a small # of stops mapbox/mapbox-gl-js#4157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve function performance for an increasing number of stops #31

Improve function performance for an increasing number of stops #31

mourner commented Dec 14, 2016

kronick commented Dec 16, 2016

kronick commented Dec 16, 2016 •

edited

Loading

kronick commented Dec 16, 2016

mourner commented Dec 16, 2016

kronick commented Dec 17, 2016 •

edited

Loading

kronick commented Dec 20, 2016

lucaswoj commented Dec 21, 2016

Improve function performance for an increasing number of stops #31

Improve function performance for an increasing number of stops #31

Comments

mourner commented Dec 14, 2016

kronick commented Dec 16, 2016

kronick commented Dec 16, 2016 • edited Loading

kronick commented Dec 16, 2016

mourner commented Dec 16, 2016

kronick commented Dec 17, 2016 • edited Loading

kronick commented Dec 20, 2016

lucaswoj commented Dec 21, 2016

kronick commented Dec 16, 2016 •

edited

Loading

kronick commented Dec 17, 2016 •

edited

Loading