-
Notifications
You must be signed in to change notification settings - Fork 15
Change 'in' (and '!in') filter to work with a hash for performance. #9
Conversation
var values = {}; | ||
Array.prototype.slice.call(arguments, 3).map(function(value) { | ||
if (key === '$type') { | ||
values[VectorTileFeatureTypes.indexOf(value)] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a .forEach if it doesn't return anything, or a for
loop. or a reduce if you want it to be functional.
Thanks for the contribution! This is a change that needs to be benchmarked for both simple and complex filters. Simple filters are extremely common and we need to make sure not to deoptimize the performance of a filter like |
The 'in' filter was previously chaining if statements leading to O(n*m) running time if using the returned filter function while iterating over a list of n features with m items in the filter. This change puts the 'in' arguments into a simple javascript hash for O(n) time. To prevent perf regressions for small filters we use the original code path below a hard-coded threshold (currently set at 30 based on rudimentary perf testing).
I changed the Agreed on avoiding perf regressions. I added a hard-coded constant (currently 30, see below) below which we do the old unrolled var ff = require('./index.js');
var n = 1000000;
var classes = [];
var features = [];
for (var i = 0; i < n; i++) {
var c = "c" + i;
classes.push(c);
var feature = {
type: 1,
properties: {
class: c
}
};
features.push(feature);
}
var runs = 50;
var testFilterFunction = function(text, filter) {
console.time(text);
for (var r = 0; r < runs; r++) {
var testFilter = ff(filter);
var matchCount = 0;
for (var i = 0; i < n; i++) {
if (testFilter(features[i])) {
matchCount++;
}
}
}
console.timeEnd(text);
console.log(matchCount);
};
var test = function(subN) {
var filter = ["in", "class"].concat(classes.slice(0, subN));
testFilterFunction(subN + "/" + n, filter);
};
test(1);
test(29); // Unrolled
test(30); // Map lookup
test(100);
test(250); with the following results (29 is unrolled, 30 is map lookup):
I did further perf testing around the 1-5 range and found neither implementation to be consistently faster than the other. |
@dcervelli @jfirebaugh Suddenly got an idea — what do you think about keeping the array form, BUT sorting it when generating code, and doing a simple binary search instead of |
That's definitely an improvement from the original code (although we'd need to worry about perf regression in the filter construction case, not sure if that's important). I don't think there's any net memory savings between the two methods because all of the values are in memory somewhere (code is memory too!) -- my hunch would be that the map uses less memory than the unrolled loop. Of course, the map gets you O(n) instead of O(n lg n) with the binary search. On the whole I still prefer the map option. The biggest argument against taking this pull request is that increases the complexity significantly, which I acknowledge. That said, this module does one thing at a low level and the complexity doesn't leak out, so I don't worry about it. Also, I'll file an issue on this too, but there's another problem that we will likely need to fix which is that this function errors out with more than 64k elements in the set (stack overflow because they are all passed as args). |
The same article says "Hash tables in V8 are large arrays containing keys and values.". An array would need just the keys and have 2 times less elements. So an array would take slightly less memory probably. |
Makes sense.
Oops, of course this makes sense. For some reason I was assuming that you were suggesting it would work similarly to how it worked before with unrolled if statements. An unrolled binary search generator sounds awful to write but I suppose it could be done :-) |
@dcervelli yeah, unrolled is really bad on many items in all senses. I'll make a binary search PR and we'll compare. |
@dcervelli just a quick note — compilation time is not nearly as important as filter time, because compilation happens only once for each filter on a style change, and filter runs hundreds of thousands of times (for each map feature). So compilation shouldn't be a part of the main benchmark. |
Closing in favor of #12. |
The 'in' filter was previously chaining if statements leading to
O(n*m) running time if using the returned filter function while
iterating over a list of n features with m items in the filter.
This change puts the 'in' arguments into a simple javascript hash
for O(n) time.