Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common APIs across array libraries #6

Closed
kgryte opened this issue Jun 15, 2020 · 6 comments
Closed

Common APIs across array libraries #6

kgryte opened this issue Jun 15, 2020 · 6 comments

Comments

@kgryte
Copy link
Contributor

kgryte commented Jun 15, 2020

Overview

To help further the discussion of what array APIs should be included in the standard, I've compiled a (WIP) list of common APIs across various array libraries.

This list should provide some indication as to API importance from the library development perspective based on API curation and need and should summarize current existing practice.

Goal

To standardize a common set of core APIs and minimal signatures (i.e., argument order and keyword arguments) that every array API should implement in order to be array specification compliant.

Method

I compiled the list by doing the following:

  1. Generating a list of APIs based on publicly documented array APIs (e.g., by scraping website documentation).
  2. Computing the intersection across the individual datasets.

The following libraries were analyzed:

  • numpy
  • cupy
  • dask.array
  • jax
  • mxnet
  • pytorch
  • tensorflow

APIs

The following APIs were found to be common across the above libraries (using NumPy's naming conventions):

angle
arange
arccos
arcsin
arctan
arctan2
argmax
argmin
array
ceil
concatenate
conj
cos
cosh
cumprod
cumsum
einsum
exp
expm1
eye
flip
floor
full
imag
linalg.cholesky
linalg.inv
linalg.norm
linalg.qr
linalg.solve
linalg.svd
linspace
log
log1p
logaddexp
matmul
maximum
mean
meshgrid
minimum
ones
ones_like
prod
real
reshape
roll
sign
sin
sinh
sqrt
square
squeeze
stack
std
sum
tan
tanh
tensordot
trace
transpose
trunc
var
where
zeros
zeros_like

We can split these APIs into various categories as follows...

Array Creation

arange
array
eye
full
linspace
meshgrid
ones
ones_like
zeros
zeros_like

Array Manipulation

concatenate
flip
reshape
roll
squeeze
stack

Special Functions

ceil
exp
expm1
floor
log
log1p
logaddexp
maximum
minimum
sign
square
sqrt
trunc

Trigonometry

arccos
arcsin
arctan
arctan2
cos
cosh
sin
sinh
tan
tanh

Complex Numbers

angle
conj
imag
real

Reductions

cumprod
cumsum
mean
prod
std
sum
var

Linear Algebra

einsum
linalg.cholesky
linalg.inv
linalg.norm
linalg.qr
linalg.solve
linalg.svd
matmul
tensordot
trace
transpose

Indexing

argmax
argmin
where

Next Steps

  1. Provide the intersection of keyword arguments for each of the above APIs.

Questions

  1. While the above uses NumPy naming conventions, some of the above libraries have chosen to deviate from NumPy conventions (absolute vs abs). Are there APIs which should be aliased differently?
  2. How to handle/encode missing data in element-wise functions (ufuncs) and reductions?
  3. Can we standardize a core subset of the above APIs in terms of method names and a limited set of keyword arguments?
  4. To allow for API extensibility, can we specify a common API for arbitrary element-wise and/or axis-wise operations (e.g., apply, reduce)?

Feedback is welcome. :)

@amueller
Copy link

For 1) so this listing excludes these functions, right? Should we manually alias them to get a more complete list?

Also, shouldn't we also look at the complement, i.e. all the other functions that are not in each library and list in which libraries they are?

@kgryte
Copy link
Contributor Author

kgryte commented Jun 15, 2020

@amueller Yes, we can compile the list of alternate aliases and the complement. Will add to the next steps! Thanks!

@kgryte
Copy link
Contributor Author

kgryte commented Jun 18, 2020

@amueller Quick update:

  1. Here is the complement.
  2. Here is the intersection, which includes the various naming conventions used across libraries.

@kgryte
Copy link
Contributor Author

kgryte commented Jun 18, 2020

Update: based on the record data from @saulshanabrook (discussed in #5), I've generated the API intersection, but ranked according to relative usage across various downstream libraries. You can find the data here.

@kgryte
Copy link
Contributor Author

kgryte commented Jul 9, 2020

Update: I've added a Jupyter notebook containing some preliminary analysis of the common API surface across various array libraries. See here.

@rgommers
Copy link
Member

This is all done, and https://github.com/data-apis/array-api-comparison holds the up-to-date data and tooling. So I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants