Skip to content

moonfolk/MiFM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-way Interacting Regression via Factorization Machines

This is a Python 2 implementation of MiFM algorithm for interaction discovery and prediction (M. Yurochkin, X. Nguyen, N. Vasiloglou to appear in NIPS 2017). Code written by Mikhail Yurochkin.

Overview

This is a demonstration of MiFM on Abalone data.

First compile cython code in cython folder. On Ubuntu run:

cython g_c_sampler.pyx
python setup.py build_ext --inplace

It implemets Gibbs sampling updates and prediction function

prediction/predict_f_all.py Python wrapper for Cython code to aggregate MCMC samples for prediction

py_scripts/train_f.py Python wrapper for Cython code to run Gibbs sampling

py_scripts/py_functions.py Gibbs sampling for hyperpriors and initialization

mifm_class.py Implements MiFM class; data preprocessing; posterior analysis of interactions

abalone_example.py downloads Abalone dataset and shows how to use MiFM and extract interactions

Implementation is designed to be used in the interactive mode (e.g. Python IDE like Spyder).

Usage guide

MiFM(K=5, J=50, it=700, lin_model=True, alpha=1., verbose=False, restart=5, restart_iter=50, thr=300, rate=25, ncores=1, use_mape=False)

Parameters:

K: rank of matrix of coefficients V

J: number of interactions (columns) in Z

it: number of Gibbs sampling iterations

lin_model: whether to include linear effects (w_1,...,w_D)

alpha: FFM_alpha parameter. Smaller values encourage deeper interactions

verbose: whether to print intermediate RMSE train scores

restart and restart_iter: how many initializations to try with restart_iter iterations each. Then best initialization based on training RMSE is used for fitting

ncores: how many cores to use for initialization with restarts

use_mape: whether to use AMAPE instead of RMSE to select best initialization

thr: number of MCMC iterations after which samples are collected (i.e. burn-in)

rate: each rate iteration is saved

Methods:

fit(X, y, cat_to_v, v_to_cat)

X: training data after one-hot encoding

y: response

cat_to_v: list of category to value after one-hot encoding (see example with Abalone data)

v_to_cat: dictionary of category to values before one-hot encoding (see example with Abalone data)

Returns list of MCMC samples. Each sample is a list [bias, linear coefficients, V, Z]

predict(self, X)

Note: can only be used on fitted object. Returns predicted values for testing data X using Monte Carlo estimator of the mean response.

score(self, X, y)

Makes predictions and computes RMSE or AMAPE

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages