RFgroove - Importance Measure and Selection for Groups of Variables with Random Forests

Implement an importance measure for groups of features and the Recursive Feature Elimination algorithm (from Guyon et al. 2002).

Based on the two following articles:

B. Gregorutti, B. Michel, P. Saint-Pierre (2017). Correlation and variable importance in random forests. arXiv link
B. Gregorutti, B. Michel, P. Saint-Pierre (2015). Grouped variable importance with random forests and application to multiple functional data analysis. arXiv link
I. Guyon, J. Weston, S. Barnhill, & V. Vapnik (2002). Gene selection for cancer classification using support vector machines, Mach. Learn., 46(1-3), 389–422.

REQUIREMENTS

numpy
joblib
scikit-learn

INSTALLATION

git clone git@github.com:bgregorutti/rfgroove-py.git
cd rfgroove-py/
pip install .

CODE EXAMPLES

Feature importance:

from sklearn.ensemble import RandomForestRegressor
from rfgroove.dataset_generation import gaussian_multidimensional
from rfgroove.importance import grouped_importance

# Build a dataset
X, y = gaussian_multidimensional(c=.5, size=1000, n_features_per_groups=5)

# Fit a RF model
regr = RandomForestRegressor(n_estimators=100, oob_score=True, max_samples=.1)
regr.fit(X, y)

# Compute the feature importance measure
groups = [list(range(5)), list(range(5, 10))] + [[k] for k in range(10, 15)]
imp = grouped_importance(regr, X, y, groups)

see test/test_selection.py.

Feature selection:

from sklearn.ensemble import RandomForestRegressor
from rfgroove.dataset_generation import gaussian_multidimensional
from rfgroove.selection import RFE
    
# Build a dataset
X, y = gaussian_multidimensional(c=.5, size=1000, n_features_per_groups=5)

# Instanciate a RandomForestRegressor object, as base model
base = RandomForestRegressor(n_estimators=1000, bootstrap=True, oob_score=True, max_samples=.1)

# Run the selection algorithm
groups = [list(range(5)), list(range(5, 10))] + [[k] for k in range(10, 15)]
selector = RFE(base, groups, n_jobs=-1)
selector.fit(X, y)

see test/test_importance.py.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
rfgroove		rfgroove
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RFgroove - Importance Measure and Selection for Groups of Variables with Random Forests

REQUIREMENTS

INSTALLATION

CODE EXAMPLES

About

Releases

Packages

Languages

License

bgregorutti/rfgroove-py

Folders and files

Latest commit

History

Repository files navigation

RFgroove - Importance Measure and Selection for Groups of Variables with Random Forests

REQUIREMENTS

INSTALLATION

CODE EXAMPLES

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages