-
Notifications
You must be signed in to change notification settings - Fork 15
Faster and simpler filter implementation #12
Conversation
Benchmark for simple filters (this branch is much faster than master): # master
create 5-item filter: 1.174ms
filter 1 million times: 89.382ms
# pr 9
create 5-item filter: 1.081ms
filter 1 million times: 89.812ms
# simple
create 5-item filter: 0.796ms
filter 1 million times: 30.522ms |
Benchmark for 2k: # master
create 2000-item filter: 10.961ms
filter 1 million times: 20915.215ms
# pr 9
create 2000-item filter: 6.585ms
filter 1 million times: 215.851ms
# simple
create 2000-item filter: 0.747ms
filter 1 million times: 1414.595ms So for filtering against a huge number of items, this branch is much faster than master but much slower than pr/9. I'll now try to implement binary search and this should change things dramatically though. |
Added binary search! It turns on for more than 200 items (picked by testing running benchmarks). Results are great: # pr 9
create 100000-item filter: 428.668ms
filter 1 million times: 499.984ms
# simple
create 100000-item filter: 71.494ms
filter 1 million times: 383.677ms |
@mourner Awesome! Way simpler. Let me quickly add a benchmark on randomly generated strings as the filter items instead of numbers. |
For random string data (which is closer to our use case) I get these results:
Given the simpler code, I'm totally OK with this until it proves to be a bottleneck (which I doubt it will). And for calibration: on my box for bench/big.js I get:
Here's my benchmark: 'use strict';
var filter = require('../');
var N = 64000;
var arr = ['in', 'foo'];
for (var i = 0; i < N; i++) arr.push(i + " " + Math.random());
console.time('create ' + N + '-item filter');
var f = filter(arr);
console.timeEnd('create ' + N + '-item filter');
var feature = {properties: {foo: 0}};
console.time('filter 1 million times');
for (var i = 0; i < 1000000; i++) {
feature.properties.foo = arr[Math.floor(Math.random() * N) + 2];
f(feature);
}
console.timeEnd('filter 1 million times'); Thanks! |
@@ -294,7 +294,7 @@ test('!in, $type', function(t) { | |||
|
|||
test('any', function(t) { | |||
var f1 = filter(['any']); | |||
t.equal(f1({properties: {foo: 1}}), false); | |||
t.equal(f1({properties: {foo: 1}}), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't change; returning false
for an empty any
is standard mathematical / functional programming behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know that, it's not very intuitive. Will fix.
@dcervelli great! I think it's still slower than it could be due to the fact that you have to evaluate the whole huge array on every filter function call. For future improvements, we can make an attempt at creating a closure that would contain all the big arrays referenced in filters so that the function call itself remains lightweight. |
Closes #10, ref #8. Alternative approach, still using eval.
Filters have the same performance, but filter creation is 40% faster, and there is 30% less code.
It also makes
in
with a lot of items faster due to usingindexOf
, and it'll be easier to significantly improve further on this basis.cc @jfirebaugh @dcervelli