Skip to content

Latest commit

 

History

History
176 lines (131 loc) · 10.9 KB

README.md

File metadata and controls

176 lines (131 loc) · 10.9 KB

daal4py - A Convenient Python API to the Intel(R) oneAPI Data Analytics Library

Build Status Join the community on GitHub Discussions PyPI Version Conda Version

A simplified API to Intel(R) oneAPI Data Analytics Library that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel(R) oneAPI Data Analytics Library for either direct usage or integration into one's own framework and extending this beyond by providing drop-in paching for scikit-learn.

Running full scikit-learn test suite with daal4py optimization patches:

  • CircleCI when applied to scikit-learn from PyPi
  • CircleCI when applied to build from master branch

👀 Follow us on Medium

We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis the help of daal4py. Here are our latest blogs:

🔗 Important links

💬 Support

Report issues, ask questions, and provide suggestions using:

You may reach out to project maintainers privately at onedal.maintainers@intel.com

🛠 Installation

daal4py is available at the Python Package Index, on Anaconda Cloud in Conda-Forge channel and in Intel channel.

# PyPi
pip install daal4py
# Anaconda Cloud from Conda-Forge channel (recommended for conda users by default)
conda install daal4py -c conda-forge
# Anaconda Cloud from Intel channel (recommended for Intel® Distribution for Python)
conda install daal4py -c intel
[Click to expand] ℹ️ Supported configurations

📦 PyPi channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU, GPU] [CPU, GPU] [CPU, GPU] [CPU, GPU]
Windows [CPU, GPU] [CPU, GPU] [CPU, GPU] [CPU, GPU]
OsX [CPU] [CPU] [CPU]

📦 Anaconda Cloud: Conda-Forge channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU] [CPU] [CPU] [CPU]
Windows [CPU] [CPU] [CPU] [CPU]
OsX

📦 Anaconda Cloud: Intel channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU, GPU]
Windows [CPU, GPU]
OsX [CPU]

You can build daal4py from sources as well.

⚡️ Get Started

Accelerate scikit-learn with the core functionality of daal4py without changing the code.

Intel CPU optimizations patching

import numpy as np
from daal4py.sklearn import patch_sklearn
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Intel CPU/GPU optimizations patching

import numpy as np
from daal4py.sklearn import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)

🚀 Scikit-learn patching

Speedups of daal4py-powered Scikit-learn over the original Scikit-learn
Technical details: float type: float64; HW: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.23.1, Intel® oneDAl (2021.1 Beta 10)

daal4py patching affects performance of specific Scikit-learn functionality listed below. In cases when unsupported parameters are used, daal4py fallbacks into stock Scikit-learn. These limitations described below. If the patching does not cover your scenarios, submit an issue on GitHub.

⚠️ We support optimizations for the last four versions of scikit-learn. The latest release of daal4py-2021.1 supports scikit-learn 0.21.X, 0.22.X, 0.23.X and 0.24.X.

[Click to expand] 🔥 Applying the daal4py patch will impact the following existing scikit-learn algorithms:
Task Functionality Parameters support Data support
Classification SVC All parameters except kernel = 'poly' and 'sigmoid'. No limitations.
RandomForestClassifier All parameters except warmstart = True and cpp_alpha != 0, criterion != 'gini'. Multi-output and sparse data is not supported.
KNeighborsClassifier All parameters except metric != 'euclidean' or minkowski with p = 2. Multi-output and sparse data is not supported.
LogisticRegression / LogisticRegressionCV All parameters except solver != 'lbfgs' or 'newton-cg', class_weight != None, sample_weight != None. Only dense data is supported.
Regression RandomForestRegressor All parameters except warmstart = True and cpp_alpha != 0, criterion != 'mse'. Multi-output and sparse data is not supported.
KNeighborsRegressor All parameters except metric != 'euclidean' or minkowski with p = 2. Sparse data is not supported.
LinearRegression All parameters except normalize != False and sample_weight != None. Only dense data is supported, #observations should be >= #features.
Ridge All parameters except normalize != False, solver != 'auto' and sample_weight != None. Only dense data is supported, #observations should be >= #features.
ElasticNet All parameters except sample_weight != None. Multi-output and sparse data is not supported, #observations should be >= #features.
Lasso All parameters except sample_weight != None. Multi-output and sparse data is not supported, #observations should be >= #features.
Clustering KMeans All parameters except precompute_distances and sample_weight != None. No limitations.
DBSCAN All parameters except metric != 'euclidean' or minkowski with p = 2. Only dense data is supported.
Dimensionality reduction PCA All parameters except svd_solver != 'full'. No limitations.
TSNE All parameters except metric != 'euclidean' or minkowski with p = 2. Sparse data is not supported.
Unsupervised NearestNeighbors All parameters except metric != 'euclidean' or minkowski with p = 2. Sparse data is not supported.
Other train_test_split All parameters are supported. Only dense data is supported.
assert_all_finite All parameters are supported. Only dense data is supported.
pairwise_distance With metric='cosine' and 'correlation'. Only dense data is supported.

Scenarios that are only available in the master branch (not released yet):

Task Functionality Parameters support Data support
Other roc_auc_score Parameters average, sample_weight, max_fpr and multi_class are not supported. No limitations.

📜 scikit-learn verbose

To find out which implementation of the algorithm is currently used (daal4py or stock Scikit-learn), set the environment variable:

  • On Linux and Mac OS: export IDP_SKLEARN_VERBOSE=INFO
  • On Windows: set IDP_SKLEARN_VERBOSE=INFO

For example, for DBSCAN you get one of these print statements depending on which implementation is used:

  • INFO: sklearn.cluster.DBSCAN.fit: uses Intel(R) oneAPI Data Analytics Library solver
  • INFO: sklearn.cluster.DBSCAN.fit: uses original Scikit-learn solver

Read more in the documentation.