We provide both simple Python interface and scikit-learn wrapper interface. Before you use the Python interface, you must build ThunderSVM.
-
Please refer to Installation for building ThunderSVM.
-
Then, if you want to install the Python package, go to the project root directory and run:
cd python && python setup.py install
-
However, you don't need to install the Python package in order to use it from Python. Thus, under
./build/lib/
of the ThunderSVM root directory, you should be able to see a library of ThunderSVM (e.g.,libthundersvm.so
on Linux machines). -
After you have successfully done the above two steps, it is ready to start using Python interfaces.
- numpy
- scipy
- sklearn
The usage of thundersvm scikit interface is similar to sklearn.svm.
class SVC(kernel = 'rbf', degree = 3, gamma = 'auto', coef0 = 0.0, C = 1.0, tol = 0.001, probability = False, class_weight = None, shrinking = False, cache_size = None, verbose = False, max_iter = -1, n_jobs = -1, max_mem_size = -1, random_state = None, decision_function_shape = 'ovo')
class NuSVC(kernel = 'rbf', degree = 3, gamma = 'auto', coef0 = 0.0, nu = 0.5, tol = 0.001, probability = False, shrinking = False, cache_size = None, verbose = False, max_iter = -1, n_jobs = -1, max_mem_size = -1, random_state = None, decision_function_shape = 'ovo')
class OneClassSVM(kernel = 'rbf', degree = 3, gamma = 'auto', coef0 = 0.0, nu = 0.5, tol = 0.001, shrinking = False, cache_size = None, verbose = False, max_iter = -1, n_jobs = -1, max_mem_size = -1, random_state = None)
class SVR(kernel = 'rbf', degree = 3, gamma = 'auto', coef0 = 0.0, C = 1.0, epsilon = 0.1, tol = 0.001, probability = False, shrinking = False, cache_size = None, verbose = False, max_iter = -1, n_jobs = -1, max_mem_size = -1)
class NuSVR(kernel = 'rbf', degree = 3, gamma = 'auto', coef0 = 0.0, nu = 0.5, C = 1.0, tol = 0.001, probability = False, shrinking = False, cache_size = None, verbose = False, max_iter = -1, n_jobs = -1, max_mem_size = -1)
kernel: string, optional(default='rbf')
set type of kernel function
'linear': u'*v
'polynomial': (gamma*u'*v + coef0)^degree
'rbf': exp(-gamma*|u-v|^2)
'sigmoid': tanh(gamma*u'*v + coef0)
'precomputed' -- precomputed kernel (kernel values in training_set_file)
degree: int, optional(default=3)
set degree in kernel function
gamma: float, optional(default='auto')
set gamma in kernel function (auto:1/num_features)
coef0: float, optional(default=0.0)
set coef0 in kernel function
C: float, optional(default=1.0)
set the parameter C of C-SVC, epsilon-SVR, and nu-SVR
nu: float, optional(default=0.5)
set the parameter nu of nu-SVC, one-class SVM, and nu-SVR
epsilon: float, optional(default=0.1)
set the epsilon in loss function of epsilon-SVR
tol: float, optional(default=0.001)
set tolerance of termination criterion (default 0.001)
probability: boolean, optional(default=False)
whether to train a SVC or SVR model for probability estimates, True or False
class_weight: {dict, 'balanced'}, optional(default=None)
set the parameter C of class i to weight*C, for C-SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
shrinking: boolean, optional (default=False, not supported yet for True)
whether to use the shrinking heuristic.
cache_size: float, optional, not supported yet.
specify the size of the kernel cache (in MB).
verbose: bool(default=False)
enable verbose output. Note that this setting takes advantage of a per-process runtime setting; if enabled, ThunderSVM may not work properly in a multithreaded context.
max_iter: int, optional (default=-1)
hard limit on the number of iterations within the solver, or -1 for no limit.
n_jobs: int, optional (default=-1)
set the number of cpu cores to use, or -1 for maximum.
max_mem_size: int, optional (default=-1)
set the maximum memory size (MB) that thundersvm uses, or -1 for no limit.
gpu_id: int, optional (default=0)
set which gpu to use for training.
decision_function_shape: ‘ovo’, default=’ovo’, not supported yet for 'ovr'
only for classifier. Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2).
random_state: int, RandomState instance or None, optional (default=None), not supported yet
The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
support_: array-like, shape = [n_SV]
indices of support vectors.
support_vectors_: array-like, shape = [n_SV, n_features]
support vectors.
n_support_: array-like, dtype=int32, shape = [n_class]
number of support vectors for each class.
dual_coef_: array, shape = [n_class-1, n_SV]
coefficients of the support vector in the decision function. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial.
coef_: array, shape = [n_class * (n_class-1)/2, n_features]
Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.
intercept_: array, shape = [n_class * (n_class-1) / 2]
constants in decision function.
By default, the ThunderSVM library (e.g., libthundersvm.so
) is stored in ../build/lib
of the current working directory.
fit(X, y):
Fit the SVM model according to the given training data.
get_params([deep]):
Get parameters for this estimator.
predict(X):
Perform classification on samples in X.
score(X, y):
Returns the mean accuracy on the given test data and labels.
set_params(**params):
Set the parameters of this estimator.
decision_function(X):
Return distance of the samples X to the separating hyperplane. Only for SVC, NuSVC and OneClassSVM.
save_to_file(path):
Save the model to the file path.
load_from_file(path):
Load the model from the file path.
- Step 1: go to the Python interface.
# in thundersvm root directory
cd python
- Step 2: create a file called
sk_test.py
which has the following content.
from thundersvm import *
from sklearn.datasets import *
x,y = load_svmlight_file("../dataset/test_dataset.txt")
clf = SVC(verbose=True, gamma=0.5, C=100)
clf.fit(x,y)
x2,y2=load_svmlight_file("../dataset/test_dataset.txt")
y_predict=clf.predict(x2)
score=clf.score(x2,y2)
clf.save_to_file('./model')
print ("test score is ", score)
- Step 3: run the python script.
python sk_test.py
By default, the directory for storing the training data and results is the working directory; the ThunderSVM library (e.g., libthundersvm.so
) is stored in ../build/lib
of the current working directory.
svm_read_problem('file_name'):
read data from file_name.
return: (labels, instances)
svm_train(labels, instances, 'model_file_name', parameters):
train the SVM model and save the result to model_file_name.
svm_predict(labels, instances, 'model_file_name', 'output_file_name', parameters):
use the SVM model saved in model_file_name to predict the labels of the given instances and store the results to output_file_name.
- Step 1: go to the Python interface.
# in thundersvm root directory
cd python
- Step 2: create a file called
test.py
which has the following content.
from svm import *
y,x = svm_read_problem('../dataset/test_dataset.txt')
svm_train(y,x,'test_dataset.txt.model','-c 100 -g 0.5')
y,x=svm_read_problem('../dataset/test_dataset.txt')
svm_predict(y,x,'test_dataset.txt.model','test_dataset.predict')
- Step 3: run the python script.
python test.py