Skip to content
This repository has been archived by the owner on Nov 21, 2024. It is now read-only.

Faster and simpler filter implementation #12

Merged
merged 5 commits into from
Jan 29, 2016
Merged

Faster and simpler filter implementation #12

merged 5 commits into from
Jan 29, 2016

Conversation

mourner
Copy link
Member

@mourner mourner commented Jan 29, 2016

Closes #10, ref #8. Alternative approach, still using eval.

Filters have the same performance, but filter creation is 40% faster, and there is 30% less code.

It also makes in with a lot of items faster due to using indexOf, and it'll be easier to significantly improve further on this basis.

cc @jfirebaugh @dcervelli

@mourner
Copy link
Member Author

mourner commented Jan 29, 2016

Benchmark for simple filters (this branch is much faster than master):

# master
create 5-item filter: 1.174ms
filter 1 million times: 89.382ms

# pr 9
create 5-item filter: 1.081ms
filter 1 million times: 89.812ms

# simple 
create 5-item filter: 0.796ms
filter 1 million times: 30.522ms

@mourner
Copy link
Member Author

mourner commented Jan 29, 2016

Benchmark for 2k:

# master 
create 2000-item filter: 10.961ms
filter 1 million times: 20915.215ms

# pr 9
create 2000-item filter: 6.585ms
filter 1 million times: 215.851ms

# simple
create 2000-item filter: 0.747ms
filter 1 million times: 1414.595ms

So for filtering against a huge number of items, this branch is much faster than master but much slower than pr/9. I'll now try to implement binary search and this should change things dramatically though.

@mourner
Copy link
Member Author

mourner commented Jan 29, 2016

Added binary search! It turns on for more than 200 items (picked by testing running benchmarks). Results are great:

# pr 9
create 100000-item filter: 428.668ms
filter 1 million times: 499.984ms

# simple
create 100000-item filter: 71.494ms
filter 1 million times: 383.677ms

@dcervelli
Copy link

@mourner Awesome! Way simpler. Let me quickly add a benchmark on randomly generated strings as the filter items instead of numbers.

@dcervelli
Copy link

For random string data (which is closer to our use case) I get these results:

# pr 9
create 64000-item filter: 240.739ms
filter 1 million times: 757.678ms

# simple
create 64000-item filter: 161.316ms
filter 1 million times: 1879.914ms

Given the simpler code, I'm totally OK with this until it proves to be a bottleneck (which I doubt it will).

And for calibration: on my box for bench/big.js I get:

# simple
create 64000-item filter: 38.474ms
filter 1 million times: 324.140ms

Here's my benchmark:

'use strict';

var filter = require('../');

var N = 64000;

var arr = ['in', 'foo'];
for (var i = 0; i < N; i++) arr.push(i + " " + Math.random());

console.time('create ' + N + '-item filter');
var f = filter(arr);
console.timeEnd('create ' + N + '-item filter');

var feature = {properties: {foo: 0}};

console.time('filter 1 million times');
for (var i = 0; i < 1000000; i++) {
    feature.properties.foo = arr[Math.floor(Math.random() * N) + 2];
    f(feature);
}
console.timeEnd('filter 1 million times');

Thanks!

@@ -294,7 +294,7 @@ test('!in, $type', function(t) {

test('any', function(t) {
var f1 = filter(['any']);
t.equal(f1({properties: {foo: 1}}), false);
t.equal(f1({properties: {foo: 1}}), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't change; returning false for an empty any is standard mathematical / functional programming behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know that, it's not very intuitive. Will fix.

@mourner
Copy link
Member Author

mourner commented Jan 29, 2016

@dcervelli great! I think it's still slower than it could be due to the fact that you have to evaluate the whole huge array on every filter function call. For future improvements, we can make an attempt at creating a closure that would contain all the big arrays referenced in filters so that the function call itself remains lightweight.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants