ArrayViews

Introduction

There exists many array libraries that implement objects for storing data in allocated memory areas. Already in Python ecosystem, the number of such libraries is more than just few (see below), some of them are designed for referencing the memory of both host RAM and the memory of accelerator devices such as GPUs. Such Python packages implement various computational algorithms that one would wish to apply on the data stored in some other array object than what the algoritms use.

Many of the array object implementations support Python Buffer Protocol PEP-3118 that makes it possible to create array objects from other implementations of array objects without actually copying the memory - this is called creating array views.

As a side note, unfortunately, the Python Buffer Protocol is incomplete when considering data storage in devices memory. The PEP-3118 lacks the device concept which makes it almost impossible to use existing array storage implementations to hold the memory pointers of such devices. This has resulted in a emergence of a number of new array libraries specifically designed for holding pointers to device memory. However, the approach of reimplementing the array storage objects for each different device from scatch does not scale well as the only essential restriction is about the interpretation of a memory pointer - whether the pointer value can be dereferenced in a (host or device) process to use the data, or not. The rest of the array object implementation would remain the same. Instead, the Python Buffer Protocol should be extended with the device concept. Hopefully we'll see it happen in future. Meanwhile...

The aim of this project is to establish a connection between different data storage object implementations while avoiding copying the data in host or device memory. The following packages are supported:

Package	Tested versions	Storage on host	Storage on CUDA device
numpy	1.16.1	ndarray	N/A
pandas	0.24.1	Series	N/A
pyarrow	0.12.1.dev120+g7f9...	Array	CudaBuffer
xnd	0.2.0dev3	xnd	xnd
numba	0.41.0	N/A	DeviceNDArray
cupy	5.2.0	N/A	ndarray, cuda.MemoryPointer
cudf	0.6-branch	N/A	Series

Basic usage

To use arrayviews package for host memory, import the needed data storage support modules, for instance,

from arrayviews import (
  numpy_ndarray_as,
  pandas_series_as,
  pyarrow_array_as,
  xnd_xnd_as
  )

For CUDA based device memory, one can use the following import statement:

from arrayviews.cuda import (
  cupy_ndarray_as,
  numba_cuda_DeviceNDArray,
  pyarrow_cuda_buffer_as,
  xnd_xnd_cuda_as,
  cudf_Series_as,
  )
...

The general pattern of creating a specific view of another storage object is:

data_view = <data storage object>_as.<view data storage object>(data)

For example,

>>> import numpy as np
>>> import pyarrow as pa
>>> from arrayviews import numpy_ndarray_as
>>> np_arr = np.arange(5)
>>> pa_arr = numpy_ndarray_as.pyarrow_array(np_arr)
>>> print(pa_arr)
[
  0,
  1,
  2,
  3,
  4
]
>>> np_arr[2] = 999    # change numpy array
>>> print(pa_arr)
[
  0,
  1,
  999,
  3,
  4
]

Supported array views - host memory

The following table summarizes the support of creating a specific array view (top-row) for the given array storage objects (left-hand-side column).

Objects	Views
Objects	numpy.ndarray	pandas.Series	pyarrow.Array	xnd.xnd
numpy.ndarray		OPTIMAL, FULL	GENBITMAP, FULL	OPTIMAL, PARTIAL
pandas.Series	OPTIMAL, FULL		GENBITMAP, FULL	OPTIMAL, PARTIAL
pyarrow.Array	OPTIMAL, PARTIAL	OPTIMAL, PARTIAL		OPTIMAL, PARTIAL
xnd.xnd	OPTIMAL, PARTIAL	OPTIMAL, PARTIAL	OPTIMAL, PARTIAL

Comments

In numpy.ndarray and pandas.Series, the numpy.nan value is interpreted as null value.
OPTIMAL means that view creation does not require processing of array data
GENBITMAP means that view creation does requires processing of array data in the presence of null or nan values. By default, such processing is disabled.
FULL means that view creation supports the inputs with null values.
PARTIAL means that view creation does not support the inputs with null values.
For the implementation of view constructions, hover over table cell or click on the links to arrayviews package source code.

Benchmark: creating array views - host memory

Objects	Views
Objects	numpy.ndarray	pandas.Series	pyarrow.Array	xnd.xnd
numpy.ndarray	0.99(0.98)	304.57(304.49)	54.38(54.58)	14.97(14.93)
pandas.Series	29.86(29.68)	1.01(1.0)	110.25(110.86)	48.47(48.37)
pyarrow.Array	17.61(N/A)	350.51(N/A)	1.0(1.0)	25.71(N/A)
xnd.xnd	14.22(N/A)	331.47(N/A)	80.88(N/A)	1.0(1.0)

Comments

The numbers in the table are <elapsed time to create a view of an obj>/<elapsed time to call 'def dummy(obj): return obj'>.
Results in the parenthesis correspond to objects with nulls or nans. No attempts are made to convert nans to nulls.
Test arrays are 64-bit float arrays of size 51200.

Supported array views - CUDA device memory

Objects	Views
Objects	pyarrow CudaBuffer	numba DeviceNDArray	cupy.ndarray	cupy MemoryPointer	xnd.xnd CUDA	cudf Series
pyarrow CudaBuffer		OPTIMAL, FULL	OPTIMAL, FULL	OPTIMAL, FULL	OPTIMAL, FULL	NOT IMPL
numba DeviceNDArray	OPTIMAL, FULL		OPTIMAL, FULL	OPTIMAL, FULL	DERIVED, FULL	OPTIMAL, FULL
cupy.ndarray	OPTIMAL, FULL	OPTIMAL, FULL		OPTIMAL, FULL	NOT IMPL	NOT IMPL
cupy MemoryPointer	NOT IMPL	NOT IMPL	NOT IMPL		NOT IMPL	NOT IMPL
xnd.xnd CUDA	OPTIMAL, FULL	DERIVED, FULL	OPTIMAL, FULL	OPTIMAL, FULL		NOT IMPL
cudf Series	NOT IMPL	OPTIMAL, FULL	NOT IMPL	NOT IMPL	NOT IMPL

Benchmark: creating array views - CUDA device memory

Objects	Views
Objects	pyarrow CudaBuffer	numba DeviceNDArray	cupy.ndarray	cupy MemoryPointer	xnd.xnd CUDA	cudf Series
pyarrow CudaBuffer	0.99	381.57	25.89	14.32	22.34	NOT IMPL
numba DeviceNDArray	53.68	0.99	37.55	22.68	92.66	154.96
cupy.ndarray	34.96	356.44	1.0	1.17	NOT IMPL	NOT IMPL
cupy MemoryPointer	NOT IMPL	NOT IMPL	NOT IMPL	1.0	NOT IMPL	NOT IMPL
xnd.xnd CUDA	45.92	452.21	42.64	29.79	0.99	NOT IMPL
cudf Series	NOT IMPL	3.39	NOT IMPL	NOT IMPL	NOT IMPL	0.99

Comments

Test arrays are 8-bit unsigned integer arrays of size 51200.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
arrayviews		arrayviews
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArrayViews

Introduction

Basic usage

Supported array views - host memory

Comments

Benchmark: creating array views - host memory

Comments

Supported array views - CUDA device memory

Benchmark: creating array views - CUDA device memory

Comments

About

Releases

Packages

Languages

License

xnd-project/arrayviews

Folders and files

Latest commit

History

Repository files navigation

ArrayViews

Introduction

Basic usage

Supported array views - host memory

Comments

Benchmark: creating array views - host memory

Comments

Supported array views - CUDA device memory

Benchmark: creating array views - CUDA device memory

Comments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages