Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve function performance for an increasing number of stops #31

Closed
mourner opened this issue Dec 14, 2016 · 7 comments
Closed

Improve function performance for an increasing number of stops #31

mourner opened this issue Dec 14, 2016 · 7 comments
Assignees

Comments

@mourner
Copy link
Member

mourner commented Dec 14, 2016

As noted by @kronick, evaluation of functions with the regards to the number of stops is more expensive that it could be — we're looping through all stops for each value. Instead:

  • categorical functions could use a hash to improve from O(n) to O(1)
  • interval and exponential functions can use the fact that stops are supposed to be sorted —using binary search to improve from O(n) to O(log N). I used a similar trick to optimize feature-filter.
@kronick
Copy link
Contributor

kronick commented Dec 16, 2016

I've started work on this issue with a binary search on interval and exponential in this branch: https://github.com/mapbox/mapbox-gl-function/blob/optimized-search/index.js#L118 .

Searching a hash mapcategorical is next up and looks like less of an algorithm question and more of an architectural issue of where to store the hash map.

@kronick
Copy link
Contributor

kronick commented Dec 16, 2016

Initial results:

  • master:

    >>> Evaluating exponential  function for 100000 iterations with 10000 stops
    Only include values within the domain of stops:
    Time: 1398.098ms
    Include values outside the domain of stops:
    Time: 1387.106ms
    
    
    >>> Evaluating exponential  function for 100000 iterations with 100 stops
    Only include values within the domain of stops:
    Time: 26.592ms
    Include values outside the domain of stops:
    Time: 21.098ms
    
    >>> Evaluating exponential  function for 1000000 iterations with 10 stops
    Only include values within the domain of stops:
    Time: 132.602ms
    Include values outside the domain of stops:
    Time: 74.718ms
    
  • optimized-search:

    >>> Evaluating exponential  function for 100000 iterations with 10000 stops
    Only include values within the domain of stops:
    Time: 51.151ms
    Include values outside the domain of stops:
    Time: 25.580ms
    
    
    >>> Evaluating exponential  function for 100000 iterations with 100 stops
    Only include values within the domain of stops:
    Time: 26.645ms
    Include values outside the domain of stops:
    Time: 11.884ms
    
    >>> Evaluating exponential  function for 1000000 iterations with 10 stops
    Only include values within the domain of stops:
    Time: 156.097ms
    Include values outside the domain of stops:
    Time: 80.232ms
    

So performance is slightly slower with 10 stops, but significantly faster with 10000. If the slow-for-10-stops case is a concern, my binary search code could probably use more optimization or we could have a conditional for which search method to apply depending on number of stops.

I pushed a profile.js to the tests directory that shows how I'm doing this timing, but please let me know if there is a more appropriate place for that.

@kronick
Copy link
Contributor

kronick commented Dec 16, 2016

Categorical functions are now in the optimized-search branch. Profile results are:

  • optimized-search:

    >>> Evaluating categorical function for 10000 iterations with 10000 stops
    Using strings as categories:
    Time: 1.976ms
    Using values not included in string  categories:
    Time: 2.220ms
    Using integers as categories:
    Time: 4.148ms
    Using values not included in integer categories:
    Time: 1.310ms
    
    
    >>> Evaluating categorical function for 10000 iterations with 100 stops
    Using strings as categories:
    Time: 0.987ms
    Using values not included in string  categories:
    Time: 1.750ms
    Using integers as categories:
    Time: 3.551ms
    Using values not included in integer categories:
    Time: 1.020ms
    
    
    >>> Evaluating categorical function for 100000 iterations with 10 stops
    Using strings as categories:
    Time: 9.537ms
    Using values not included in string  categories:
    Time: 4.704ms
    Using integers as categories:
    Time: 3.324ms
    Using values not included in integer categories:
    Time: 4.091ms
    
    
    >>> Evaluating categorical function for 10000 iterations with 100 stops
    Using strings as categories:
    Time: 0.420ms
    Using values not included in string  categories:
    Time: 0.468ms
    Using integers as categories:
    Time: 0.324ms
    Using values not included in integer categories:
    Time: 0.413ms
    
  • master:

    >>> Evaluating categorical function for 10000 iterations with 10000 stops
    Using strings as categories:
    Time: 325.803ms
    Using values not included in string  categories:
    Time: 389.536ms
    Using integers as categories:
    Time: 1284.433ms
    Using values not included in integer categories:
    Time: 1825.276ms
    
    
    >>> Evaluating categorical function for 10000 iterations with 100 stops
    Using strings as categories:
    Time: 18.475ms
    Using values not included in string  categories:
    Time: 24.328ms
    Using integers as categories:
    Time: 19.700ms
    Using values not included in integer categories:
    Time: 27.892ms
    
    
    >>> Evaluating categorical function for 100000 iterations with 10 stops
    Using strings as categories:
    Time: 24.741ms
    Using values not included in string  categories:
    Time: 29.082ms
    Using integers as categories:
    Time: 18.549ms
    Using values not included in integer categories:
    Time: 25.753ms
    
    
    >>> Evaluating categorical function for 10000 iterations with 100 stops
    Using strings as categories:
    Time: 16.612ms
    Using values not included in string  categories:
    Time: 22.535ms
    Using integers as categories:
    Time: 12.805ms
    Using values not included in integer categories:
    Time: 20.307ms
    

So optimized-search is faster across the board. I can't explain the variability in times for different types of categories (string vs integer) and it seems to change depending on the order I run the tests, so my guess is it could be some internal memory optimization stuff going on but I would welcome other opinions if that's a concern.

@mourner
Copy link
Member Author

mourner commented Dec 16, 2016

@kronick fantastic results! This is a huge improvement. The case of a small number of stops is a concern because most functions in our default styles are like that, but I think we can follow-up with another PR for that.

@kronick
Copy link
Contributor

kronick commented Dec 17, 2016

I made a quick conditional-search branch that chooses which search function to use depending on the number of stops, hard-coded to switch over at 100. It feels a little hacky but maybe someone wants to build on it if the small number of stops case is an ongoing issue.

The benchmarks are as you might expect-- same as the old master for < 100 stops, same as optimized-search for >= 100 stops on exponential and interval searches. The conditional check happens once when the function is created, not every time it is evaluated.

@kronick
Copy link
Contributor

kronick commented Dec 20, 2016

Currently investigating my code as the source of this bug: mapbox/mapbox-gl-js#3838

The first thing I notice is that the following expanded tests fail on input values 680, 681, 682, 690, but not on 7300, 7301, 7302, 7320 (note that I'm exporting interpolateNumber() directly from index.js, to just isolate the index binary search code):

     t.test('many elements', function(t) {
          var f = MapboxGLFunction({
            type: 'exponential',
            "stops": [
            [2, 100],
            [55, 200],
            [132, 300],
            [607, 400],
            [1287, 500],
            [1985, 600],
            [2650, 700],
            [3299, 800],
            [3995, 900],
            [4927, 1000],
            [7147, 10000],
            [10028, 100000],
            [12889, 1000000],
            [40000, 10000000]
            ]
          });

          t.equal(f(2), 100);
          t.equal(f(2), interpolateNumber(2, 1, 2, 55, 100, 200));
          t.equal(f(20), interpolateNumber(20, 1, 2, 55, 100, 200));
          t.equal(f(607), 400);
          t.equal(f(680), interpolateNumber(680, 1, 607, 1287, 400, 500));
          t.equal(f(681), interpolateNumber(681, 1, 607, 1287, 400, 500));
          t.equal(f(682), interpolateNumber(682, 1, 607, 1287, 400, 500));
          t.equal(f(690), interpolateNumber(690, 1, 607, 1287, 400, 500));
          t.equal(f(4927), 1000); //86
          t.equal(f(7300), interpolateNumber(7300, 1, 7147, 10028, 10000, 100000));
          t.equal(f(7301), interpolateNumber(7301, 1, 7147, 10028, 10000, 100000));
          t.equal(f(7302), interpolateNumber(7302, 1, 7147, 10028, 10000, 100000));
          t.equal(f(7320), interpolateNumber(7320, 1, 7147, 10028, 10000, 100000));
          t.equal(f(10000), interpolateNumber(10000, 1, 7147, 10028, 10000, 100000));
          t.equal(f(20000), interpolateNumber(20000, 1, 12889, 40000, 1000000, 10000000));
          t.equal(f(40000), 10000000);

          t.end();
        });

Results are:

  not ok 85 should be equal
    ---
      operator: equal
      expected: 410.7352941176471
      actual:   415.3684210526316
      at: Test.<anonymous> (/Users/kronick/Documents/repos/mapbox-gl-function/test/test.js:172:17)
    ...
  not ok 86 should be equal
    ---
      operator: equal
      expected: 410.88235294117646
      actual:   415.57894736842104
      at: Test.<anonymous> (/Users/kronick/Documents/repos/mapbox-gl-function/test/test.js:173:17)
    ...
  not ok 87 should be equal
    ---
      operator: equal
      expected: 411.02941176470586
      actual:   415.7894736842105
      at: Test.<anonymous> (/Users/kronick/Documents/repos/mapbox-gl-function/test/test.js:174:17)
    ...
  not ok 88 should be equal
    ---
      operator: equal
      expected: 412.2058823529412
      actual:   417.4736842105263
      at: Test.<anonymous> (/Users/kronick/Documents/repos/mapbox-gl-function/test/test.js:175:17)
    ...

It looks like the binary search is returning off-by-one index values sometimes and therefore interpolating between stops lower than it should. This is no good! I'll try to figure out what's wrong with more complete tests.

@lucaswoj
Copy link

👉 mapbox/mapbox-gl-style-spec#629

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants