Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more efficient crossfilter range/value filters #478

Closed
mtraynham opened this issue Dec 12, 2013 · 7 comments
Closed

Use more efficient crossfilter range/value filters #478

mtraynham opened this issue Dec 12, 2013 · 7 comments
Milestone

Comments

@mtraynham
Copy link
Contributor

Even though Crossfilter does give us the dimension.filterFunction capabilities, the code should still try and use dimension.filter whenever possible. In the case of brush ranges, crossfilter can use it's bisect function (which binary searches the values) and processes add/remove on portions of the index, rather than the whole index.

Adding the filters.length === 1 conditional to dc.baseMixin's _filterHandler would work accordingly:

    var _filterHandler = function (dimension, filters) {
        dimension.filter(null);

        if (filters.length === 0)
            dimension.filter(null);
        else if (filters.length === 1)
            dimension.filter(filters[0]);
        else
            dimension.filterFunction(function (d) {
                for(var i = 0; i < filters.length; i++) {
                    var filter = filters[i];
                    if (filter.isFiltered && filter.isFiltered(d)) {
                        return true;
                    } else if (filter == d) {
                        return true;
                    }
                }
                return false;
            });

        return filters;
    };
@jrideout
Copy link
Contributor

Good optimization. This will need some testing prior to acceptance.

@gordonwoodhull
Copy link
Contributor

@mtraynham, have you looked into this further? I believe we've run into performance problems because dc.js is always using filterFunction. It's especially a problem when brushing and you want the fastest response time there.

@mtraynham
Copy link
Contributor Author

@gordonwoodhull, I took it one step further actually. I forked Crossfilter and added functionality that would let me better utilize filtering. My fork of Crossfilter included Jason Davie's union filter as well as a patch to identify what kind of objects are in the dimension (obj, array, prim). This allowed me to offload to Crossfilter as much as I could in the base filterHandler of dc.js:

I also removed any references to the RangeFilter in dc.js as Crossfilter supports simple ranges.

The code below has three cases:

  • Object type dimensions ( not common; just checks for filter property existence )
  • Array type dimensions. Since you can have what I call composite dimensions ( array valued ), filters may be arrays (row filtering). They could also be single valued (heatmap). This does the correct comparison for each of those (also added a Array.protoype.compare)
  • Single valued or primitive. If none of the filters had a isFiltered function attached, this just calls 'filterUnion' which does the combination of crossfilter's filter/filterRange. If isFiltered did exist, this just uses a Array.prototype.some to see if either 'isFiltered' passed or if the dimension value was equal to the filter, using the fix from Pull Request Fixes bug 497 - Filtering date dimensions for row/pie #555

I'm actually not using this code anymore, since I've found other means to do filtering on server, but here it is. I don't want to say it, but Crossfilter is somewhat dead... so it might be worth considering forking for dc.js.

    // Create Array.compare(array)
    Array.prototype.compare = function (array) {
        // if the other array is a falsy value, return
        if (!array || !(array instanceof Array)) {
            return false;
        }
        // compare lengths - can save a lot of time
        if (this.length !== array.length) {
            return false;
        }

        for (var i = 0; i < this.length; i++) {
            // Check if the items are not equal, or if they are arrays, we will recurse
            if (!(this[i] <= array[i] && this[i] >= array[i]) || (this[i] instanceof Array &&
                array[i] instanceof Array && !this[i].compare(array[i]))) {
                return false;
            }
        }
        return true;
    };

    var baseMixin = dc.baseMixin;
    dc.baseMixin = function (_chart) {
        // Create the base mixin
        _chart = baseMixin(_chart);

        // Set our custom _chart filterHandler
        _chart.filterHandler(function (dimension, filters) {
            if (filters.length === 0) {
                dimension.filter(null);
            } else {
                var dimensionType = dimension.getType();
                if (dimensionType === crossfilter.OBJ) {
                    dimension.filterFunction(function (d) {
                        return filters.some(function (filter) { return d[filter]; });
                    });
                } else if (dimensionType === crossfilter.ARRAY) {
                    dimension.filterFunction(function (d) {
                        return filters.some(function (filter) {
                            if (d instanceof Array) {
                                if (filter instanceof Array) {
                                    if (d.compare(filter)) {
                                        return true;
                                    }
                                } else if (d.indexOf(filter) > -1) {
                                    return true;
                                }
                            } else if (filter instanceof Array) {
                                if (filter.indexOf(d) > -1) {
                                    return true;
                                }
                            }
                            return false;
                        });
                    });
                } else {
                    if (filters.some(function (filter) { return filter.isFiltered; })) {
                        dimension.filterFunction(function (d) {
                            return filters.some(function (filter) {
                                if (filter.isFiltered) {
                                    if (filter.isFiltered(d)) {
                                        return true;
                                    }
                                } else if (filter <= d && filter >= d) {
                                    return true;
                                }
                                return false;
                            });
                        });
                    } else {
                        dimension.filterUnion(filters);
                    }
                }
            }
            return filters;
        });

        return _chart;
    };

My patches for Crossfilter was only two commits, https://github.com/mtraynham/crossfilter

@gordonwoodhull
Copy link
Contributor

Thanks @mtraynham, that is very intriguing. Yes, it does seem that crossfilter is not actively maintained - to be kind, you could call it "complete" rather than "dead". ;-)

Forking it to the dc-js organization is something to seriously consider. That would be taking on some responsibility though.

Another direction to consider is that dc.js is not tied very tightly to crossfilter. (I think there are not many calls into its API beyond all and... filterFunction.) My guess is that using e.g. PourOver would not be less efficient than using crossfilter with filterFunction.

Wow, you did composite dimensions as well. Can't wait to look into that!

@mtraynham
Copy link
Contributor Author

Yeah, it was pretty interesting. The biggest thing was trying to figure out what was performant. Crossfilter's heap sort it uses can be slow for items that are similar in comparison, such as lengthy arrays. But how exactly do you sort an array of arrays, atleast from Crossfilter's point of view? From some performance testing, it should be avoided and use of the filterFunction was key. Crossfilter sorting can be avoided if you pass in objects...they don't recognize coercion and sorting is almost ignored.

When building charts, we designed a scheme that had dimensions, non-filter dimensions, and groups. When it came to non-filtered dimensions, we created a generic reducer that would bubble up. Basically wrapping reducers with other reducers...

The dimensional reducers would cache the values, so there was only one crossfilter lookup. But using some tricks we just created these crazy reducer objects built from a chain of reducers and flattened them out in a overrided .data() function. That way you could just assign accessors to the 'columns'...

Even though it all worked, our long pole was the original data download.. we were just pushing too much over the wire and it caused long load times.

For instance:

function(cf, dimension, innerReducer) {
        return {
            reduceInitial : function() {
                var p = {};
                dimension.getValues(cf).forEach(function(val) {
                    p[val] = innerReducer.reduceInitial();
                });
                return p;
            },
            reduceAdd : function(p, v) {
                var val = v[dimension.column];
                p[val] = innerReducer.reduceAdd(p[val], v);
                return p;
            },
            reduceRemove : function(p, v) {
                var val = v[dimension.column];
                p[val] = innerReducer.reduceRemove(p[val], v);
                return p;
            }
        };
    };

PourOver would be cool. Our server side solution is very similar in nature.

@gordonwoodhull gordonwoodhull added this to the v2.1 milestone Jun 5, 2014
@gordonwoodhull gordonwoodhull changed the title baseMixin _filterHandler should call dimension.filter(filters[0]) if filters.length is 1 Use more efficient crossfilter range/value filters Jun 13, 2014
@gordonwoodhull
Copy link
Contributor

Retitling because I keep trying to find this.

@gordonwoodhull
Copy link
Contributor

one idea in #988, devolving the chart registry, is to sort of "render" the filter into a query rather than always generating a filter function as we're doing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants