Intel® Scalable Dataframe Compiler (Intel® SDC) is an extension of Numba* that enables compilation of Pandas* operations. It automatically vectorizes and parallelizes the code by leveraging modern hardware instructions and by utilizing all available cores.
Intel® SDC documentation can be found here.
Note
For maximum performance and stability, please use numba from intel/label/beta
channel.
Intel® SDC is available on the Anaconda Cloud intel/label/beta
channel.
Distribution includes Intel® SDC for Python 3.6 and Python 3.7 for Windows and Linux platforms.
Intel® SDC conda package can be installed using the steps below:
> conda create -n sdc-env python=<3.7 or 3.6> pyarrow=0.17.0 pandas=1.0.5 -c anaconda -c conda-forge > conda activate sdc-env > conda install sdc -c intel/label/beta -c intel -c defaults -c conda-forge --override-channels
Intel® SDC wheel package can be installed using the steps below:
> conda create -n sdc-env python=<3.7 or 3.6> pip pyarrow=0.17.0 pandas=1.0.5 -c anaconda -c conda-forge > conda activate sdc-env > pip install --index-url https://pypi.anaconda.org/intel/label/beta/simple --extra-index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple sdc
We use Anaconda distribution of Python for setting up Intel® SDC build environment.
If you do not have conda, we recommend using Miniconda3:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh chmod +x miniconda.sh ./miniconda.sh -b export PATH=$HOME/miniconda3/bin:$PATH
Note
For maximum performance and stability, please use numba from intel/label/beta
channel.
It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Linux.
PYVER=<3.6 or 3.7> NUMPYVER=<1.16 or 1.17> conda create -n conda-build-env python=$PYVER conda-build source activate conda-build-env git clone https://github.com/IntelPython/sdc.git cd sdc conda build --python $PYVER --numpy $NUMPYVER --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels conda-recipe
export PYVER=<3.6 or 3.7> export NUMPYVER=<1.16 or 1.17> conda create -n sdc-env -q -y -c intel/label/beta -c defaults -c intel -c conda-forge python=$PYVER numpy=$NUMPYVER tbb-devel tbb4py numba=0.49 pandas=1.0.5 pyarrow=0.17.0 gcc_linux-64 gxx_linux-64 source activate sdc-env git clone https://github.com/IntelPython/sdc.git cd sdc python setup.py install
In case of issues, reinstalling in a new conda environment is recommended.
Building Intel® SDC on Windows requires Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)):
- Install Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)).
- Install Miniconda for Windows.
- Start 'Anaconda prompt'
It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Windows.
set PYVER=<3.6 or 3.7> set NUMPYVER=<1.16 or 1.17> conda create -n conda-build-env -q -y python=%PYVER% conda-build conda-verify vc vs2015_runtime vs2015_win-64 conda activate conda-build-env git clone https://github.com/IntelPython/sdc.git cd sdc conda build --python %PYVER% --numpy %NUMPYVER% --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels conda-recipe
set PYVER=<3.6 or 3.7> set NUMPYVER=<1.16 or 1.17> conda create -n sdc-env -c intel/label/beta -c defaults -c intel -c conda-forge python=%PYVER% numpy=%NUMPYVER% tbb-devel tbb4py numba=0.49 pandas=1.0.5 pyarrow=0.17.0 conda activate sdc-env set INCLUDE=%INCLUDE%;%CONDA_PREFIX%\Library\include set LIB=%LIB%;%CONDA_PREFIX%\Library\lib git clone https://github.com/IntelPython/sdc.git cd sdc python setup.py install
- If the
cl
compiler throws the error fatalerror LNK1158: cannot run 'rc.exe'
, add Windows Kits to your PATH (e.g.C:\Program Files (x86)\Windows Kits\8.0\bin\x86
). - Some errors can be mitigated by
set DISTUTILS_USE_SDK=1
. - For setting up Visual Studio, one might need go to registry at
HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\VisualStudio\SxS\VS7
, and add a string value named14.0
whose data isC:\Program Files (x86)\Microsoft Visual Studio 14.0\
. - Sometimes if the conda version or visual studio version being used are not latest then building Intel® SDC can throw some vague error about a keyword used in a file. So make sure you are using the latest versions.
Building Intel® SDC User's Guide documentation requires pre-installed Intel® SDC package along with compatible Pandas* version as well as Sphinx* 2.2.1 or later.
Intel® SDC documentation includes Intel® SDC examples output which is pasted to functions description in the API Reference.
Use pip
to install Sphinx* and extensions:
pip install sphinx sphinxcontrib-programoutput
Currently the build precedure is based on make
located at ./sdc/docs/
folder.
While it is not generally required we recommended that you clean up the system from previous documentaiton build by running:
make clean
To build HTML documentation you will need to run:
make html
The built documentation will be located in the ./sdc/docs/build/html
directory.
To preview the documentation open index.html
file.
More information about building and adding documentation can be found here.
python sdc/tests/gen_test_data.py python -m unittest
Intel® SDC follows ideas and initial code base of High-Performance Analytics Toolkit (HPAT). These academic papers describe ideas and methods behind HPAT: