Releases: enthought/distarray
DistArray 0.6 release
DistArray 0.6 release
Mailing List: distarray@googlegroups.com
Documentation: http://distarray.readthedocs.org
License: Three-clause BSD
Python versions: 2.7, 3.4, and 3.5
OS support: *nix and Mac OS X
What is DistArray?
DistArray aims to bring the ease-of-use of NumPy to data-parallel
high-performance computing. It provides distributed multi-dimensional NumPy
arrays, distributed ufuncs, and distributed IO capabilities. It can
efficiently interoperate with external distributed libraries like Trilinos.
DistArray works with NumPy and builds on top of it in a flexible and natural
way.
0.6 Release
Noteworthy improvements in this release include:
- a new website, (http://docs.enthought.com/distarray/), with links to
DistArray talks and presentations, - redistribution for block-distributed (and non-distributed) DistArrays,
- experimental "quickstart" installation scripts,
- an easier API for
Context.apply
- expanded and improved example notebooks, including a new parallel Gaussian
Elimination example by Prashant Mital, - compatibility with NumPy 1.9,
- expanded TravisCI testing (OS X and Python 3.5),
- logos by Erick Michaud, and
- several bug-fixes and code improvements.
Existing features
DistArray
- supports NumPy-like slicing, reductions, and ufuncs on distributed
multidimensional arrays; - has a client-engine process design -- data resides on the worker processes,
commands are initiated from master; - allows full control over what is executed on the worker processes and
integrates transparently with the master process; - allows direct communication between workers, bypassing the master process
for scalability; - integrates with IPython.parallel for interactive creation and exploration of
distributed data; - supports distributed ufuncs (currently without broadcasting);
- builds on and leverages MPI via MPI4Py in a transparent and user-friendly
way; - has basic support for unstructured arrays;
- supports user-controllable array distributions across workers (block,
cyclic, block-cyclic, and unstructured) on a per-axis basis; - has a straightforward API to control how an array is distributed;
- has basic plotting support for visualization of array distributions;
- separates the array’s distribution from the array’s data -- useful for
slicing, reductions, redistribution, broadcasting, and other operations; - implements distributed random arrays;
- supports
.npy
-like flat-file IO and hdf5 parallel IO (viah5py
);
leverages MPI-based IO parallelism in an easy-to-use and transparent way;
and - supports the distributed array protocol [protocol]_, which allows
independently developed parallel libraries to share distributed arrays
without copying, analogous to the PEP-3118 new buffer protocol.
Planned features
Planned features include
- array re-distribution capabilities for more distribution types;
- lazy evaluation and deferred computation for latency hiding;
- examples of interoperation with Trilinos [Trilinos]_;
- distributed broadcasting support.
- integration with other packages [petsc]_ that subscribe to the distributed
array protocol [protocol]_; - distributed fancy indexing;
- out-of-core computations;
- support for distributed sorting and other non-trivial distributed
algorithms; and - end-user control over communication and temporary array creation, and other
performance aspects of distributed computations.
History and funding
Brian Granger started DistArray as a NASA-funded SBIR project in 2008.
Enthought picked it up as part of a DOE Phase II SBIR [SBIR]_ to provide a
generally useful distributed array package. It builds on NumPy, MPI, MPI4Py,
IPython, IPython.parallel, and interfaces with the Trilinos suite of
distributed HPC solvers (via PyTrilinos [Trilinos]_).
This material is based upon work supported by the Department of Energy under
Award Number DE-SC0007699.
This report was prepared as an account of work sponsored by an agency of the
United States Government. Neither the United States Government nor any agency
thereof, nor any of their employees, makes any warranty, express or implied,
or assumes any legal liability or responsibility for the accuracy,
completeness, or usefulness of any information, apparatus, product, or process
disclosed, or represents that its use would not infringe privately owned
rights. Reference herein to any specific commercial product, process, or
service by trade name, trademark, manufacturer, or otherwise does not
necessarily constitute or imply its endorsement, recommendation, or favoring
by the United States Government or any agency thereof. The views and opinions
of authors expressed herein do not necessarily state or reflect those of the
United States Government or any agency thereof.
.. [protocol] http://distributed-array-protocol.readthedocs.org/en/rel-0.10.0/
.. [Trilinos] http://trilinos.org/
.. [petsc] http://www.mcs.anl.gov/petsc/
.. [SBIR] http://www.sbir.gov/sbirsearch/detail/410257
DistArray 0.5 release
DistArray 0.5 release
Mailing List: distarray@googlegroups.com
Documentation: http://distarray.readthedocs.org
License: Three-clause BSD
Python versions: 2.7, 3.3, and 3.4
OS support: *nix and Mac OS X
What is DistArray?
DistArray aims to bring the ease-of-use of NumPy to data-parallel
high-performance computing. It provides distributed multi-dimensional NumPy
arrays, distributed ufuncs, and distributed IO capabilities. It can
efficiently interoperate with external distributed libraries like Trilinos.
DistArray works with NumPy and builds on top of it in a flexible and natural
way.
0.5 Release
Noteworthy improvements in this release include:
- closer alignment with NumPy's API,
- support for Python 3.4 (existing support for Python 2.7 and 3.3),
- a performance-oriented MPI-only mode for deployment on clusters and
supercomputers, - a way to register user-defined functions to be callable locally on worker
processes, - more consistent naming of sub-packages,
- testing with MPICH2 (already tested against OpenMPI),
- improved and expanded examples,
- installed version testable via
distarray.test()
, and - performance and scaling improvements.
With this release, DistArray ready for real-world testing and deployment. The
project is still evolving rapidly and we appreciate the continued input from
the larger scientific-Python community.
Existing features
DistArray:
- supports NumPy-like slicing, reductions, and ufuncs on distributed
multidimensional arrays; - has a client-engine process design -- data resides on the worker processes,
commands are initiated from master; - allows full control over what is executed on the worker processes and
integrates transparently with the master process; - allows direct communication between workers, bypassing the master process
for scalability; - integrates with IPython.parallel for interactive creation and exploration of
distributed data; - supports distributed ufuncs (currently without broadcasting);
- builds on and leverages MPI via MPI4Py in a transparent and user-friendly
way; - has basic support for unstructured arrays;
- supports user-controllable array distributions across workers (block,
cyclic, block-cyclic, and unstructured) on a per-axis basis; - has a straightforward API to control how an array is distributed;
- has basic plotting support for visualization of array distributions;
- separates the array’s distribution from the array’s data -- useful for
slicing, reductions, redistribution, broadcasting, and other operations; - implements distributed random arrays;
- supports
.npy
-like flat-file IO and hdf5 parallel IO (viah5py
);
leverages MPI-based IO parallelism in an easy-to-use and transparent way;
and - supports the distributed array protocol [protocol]_, which allows
independently developed parallel libraries to share distributed arrays
without copying, analogous to the PEP-3118 new buffer protocol.
Planned features and roadmap
Near-term features and improvements include:
- array re-distribution capabilities;
- lazy evaluation and deferred computation for latency hiding;
- interoperation with Trilinos [Trilinos]_; and
- distributed broadcasting support.
The longer-term roadmap includes:
- Integration with other packages [petsc]_ that subscribe to the distributed
array protocol [protocol]_; - Distributed fancy indexing;
- Out-of-core computations;
- Support for distributed sorting and other non-trivial distributed
algorithms; and - End-user control over communication and temporary array creation, and other
performance aspects of distributed computations.
History and funding
Brian Granger started DistArray as a NASA-funded SBIR project in 2008.
Enthought picked it up as part of a DOE Phase II SBIR [SBIR]_ to provide a
generally useful distributed array package. It builds on NumPy, MPI, MPI4Py,
IPython, IPython.parallel, and interfaces with the Trilinos suite of
distributed HPC solvers (via PyTrilinos [Trilinos]_).
This material is based upon work supported by the Department of Energy under
Award Number DE-SC0007699.
This report was prepared as an account of work sponsored by an agency of the
United States Government. Neither the United States Government nor any agency
thereof, nor any of their employees, makes any warranty, express or implied,
or assumes any legal liability or responsibility for the accuracy,
completeness, or usefulness of any information, apparatus, product, or process
disclosed, or represents that its use would not infringe privately owned
rights. Reference herein to any specific commercial product, process, or
service by trade name, trademark, manufacturer, or otherwise does not
necessarily constitute or imply its endorsement, recommendation, or favoring
by the United States Government or any agency thereof. The views and opinions
of authors expressed herein do not necessarily state or reflect those of the
United States Government or any agency thereof.
.. [protocol] http://distributed-array-protocol.readthedocs.org/en/rel-0.10.0/
.. [Trilinos] http://trilinos.org/
.. [petsc] http://www.mcs.anl.gov/petsc/
.. [SBIR] http://www.sbir.gov/sbirsearch/detail/410257
DistArray 0.4: development release
DistArray 0.4: development release
Documentation: http://distarray.readthedocs.org
License: Three-clause BSD
Python versions: 2.7 and 3.3
OS support: *nix and Mac OS X
What is DistArray?
DistArray aims to bring the strengths of NumPy to data-parallel
high-performance computing. It provides distributed multi-dimensional
NumPy-like arrays and distributed ufuncs, distributed IO capabilities, and can
integrate with external distributed libraries like Trilinos. DistArray works
with NumPy and builds on top of it in a flexible and natural way.
0.4 Release
This is the third development release.
Noteworthy improvements in 0.4 include:
- basic slicing support;
- significant performance enhancements;
- reduction methods now support boolean arrays;
- an IPython notebook that demos basic functionality; and
- many bug fixes, API improvements, and refactorings.
DistArray is nearly ready for real-world use. The project is evolving rapidly
and input from the larger scientific-Python community is very valuable and
helps drive development.
Existing features
DistArray:
- has a client-engine (or master-worker) process design -- data resides on the
worker processes, and commands are initiated from master; - allows full control over what is executed on the worker processes and
integrates transparently with the master process; - allows direct communication between workers, bypassing the master process
for scalability; - integrates with IPython.parallel for interactive creation and exploration of
distributed data; - supports distributed ufuncs (currently without broadcasting);
- builds on and leverages MPI via MPI4Py in a transparent and user-friendly
way; - supports NumPy-like multidimensional arrays;
- has basic support for unstructured arrays;
- supports user-controllable array distributions across workers (block,
cyclic, block-cyclic, and unstructured) on a per-axis basis; - has a straightforward API to control how an array is distributed;
- has basic plotting support for visualization of array distributions;
- separates the array’s distribution from the array’s data -- useful for
slicing, reductions, redistribution, broadcasting, and other operations; - implements distributed random arrays;
- supports
.npy
-like flat-file IO and hdf5 parallel IO (viah5py
);
leverages MPI-based IO parallelism in an easy-to-use and transparent way;
and - supports the distributed array protocol [protocol]_, which allows
independently developed parallel libraries to share distributed arrays
without copying, analogous to the PEP-3118 new buffer protocol.
Planned features and roadmap
Near-term features and improvements include:
- MPI-only communication for performance and deployment on clusters and
supercomputers; - array re-distribution capabilities;
- interoperation with Trilinos [Trilinos]_;
- expanded tutorials, examples, and other introductory material; and
- distributed broadcasting support.
The longer-term roadmap includes:
- Lazy evaluation and deferred computation for latency hiding;
- Integration with other packages [petsc]_ that subscribe to the distributed
array protocol [protocol]_; - Distributed fancy indexing;
- Out-of-core computations;
- Support for distributed sorting and other non-trivial distributed
algorithms; and - End-user control over communication and temporary array creation, and other
performance aspects of distributed computations.
History and funding
Brian Granger started DistArray as a NASA-funded SBIR project in 2008.
Enthought picked it up as part of a DOE Phase II SBIR [SBIR]_ to provide a
generally useful distributed array package. It builds on NumPy, MPI, MPI4Py,
IPython, IPython.parallel, and interfaces with the Trilinos suite of
distributed HPC solvers (via PyTrilinos [Trilinos]_).
This material is based upon work supported by the Department of Energy under
Award Number DE-SC0007699.
This report was prepared as an account of work sponsored by an agency of the
United States Government. Neither the United States Government nor any agency
thereof, nor any of their employees, makes any warranty, express or implied,
or assumes any legal liability or responsibility for the accuracy,
completeness, or usefulness of any information, apparatus, product, or process
disclosed, or represents that its use would not infringe privately owned
rights. Reference herein to any specific commercial product, process, or
service by trade name, trademark, manufacturer, or otherwise does not
necessarily constitute or imply its endorsement, recommendation, or favoring
by the United States Government or any agency thereof. The views and opinions
of authors expressed herein do not necessarily state or reflect those of the
United States Government or any agency thereof.
.. [protocol] http://distributed-array-protocol.readthedocs.org/en/rel-0.10.0/
.. [Trilinos] http://trilinos.org/
.. [petsc] http://www.mcs.anl.gov/petsc/
.. [SBIR] http://www.sbir.gov/sbirsearch/detail/410257
DistArray 0.3: development release
DistArray 0.3: development release
Documentation: http://distarray.readthedocs.org
License: Three-clause BSD
Python versions: 2.7 and 3.3
OS support: *nix and Mac OS X
What is DistArray?
DistArray aims to bring the strengths of NumPy to data-parallel
high-performance computing. It provides distributed multi-dimensional
NumPy-like arrays and distributed ufuncs, distributed IO capabilities, and can
integrate with external distributed libraries, like Trilinos. DistArray works
with NumPy and builds on top of it in a flexible and natural way.
0.3 Release
This is the second development release.
Noteworthy improvements in 0.3 include:
- support for distributions over a subset of processes;
- distributed reductions with a simple NumPy-like API:
da.sum(axis=3)
; - an
apply()
function for easier computation with process-local data; - performance improvements and reduced communication overhead;
- cleanup, renamings, and refactorings;
- test suite improvements for parallel testing; and
- start of a more frequent release schedule.
DistArray is not ready for real-world use. We want to get input from the
larger scientific-Python community to help drive its development. The API is
changing rapidly and we are adding many new features on a fast timescale.
DistArray is currently implemented in pure Python for maximal flexibility.
Performance improvements are ongoing.
Existing features
Distarray:
- has a client-engine (or master-worker) process design -- data resides on the
worker processes, commands are initiated from master; - allows full control over what is executed on the worker processes and
integrates transparently with the master process; - allows direct communication between workers bypassing the master process for
scalability; - integrates with IPython.parallel for interactive creation and exploration of
distributed data; - supports distributed ufuncs (currently without broadcasting);
- builds on and leverages MPI via MPI4Py in a transparent and user-friendly
way; - supports NumPy-like structured multidimensional arrays;
- has basic support for unstructured arrays;
- supports user-controllable array distributions across workers (block,
cyclic, block-cyclic, and unstructured) on a per-axis basis; - has a straightforward API to control how an array is distributed;
- has basic plotting support for visualization of array distributions;
- separates the array’s distribution from the array’s data -- useful for
slicing, reductions, redistribution, broadcasting, and other operations; - implements distributed random arrays;
- supports
.npy
-like flat-file IO and hdf5 parallel IO (viah5py
);
leverages MPI-based IO parallelism in an easy-to-use and transparent way;
and - supports the distributed array protocol [protocol]_, which allows
independently developed parallel libraries to share distributed arrays
without copying, analogous to the PEP-3118 new buffer protocol.
Planned features and roadmap
- Distributed slicing
- Re-distribution methods
- Integration with Trilinos [Trilinos]_ and other packages [petsc]_ that
subscribe to the distributed array protocol [protocol]_ - Distributed broadcasting
- Distributed fancy indexing
- MPI-only communication for non-interactive deployment on clusters and
supercomputers - Lazy evaluation and deferred computation for latency hiding
- Out-of-core computations
- Extensive examples, tutorials, documentation
- Support for distributed sorting and other non-trivial distributed algorithms
- End-user control over communication and temporary array creation, and other
performance aspects of distributed computations
History
Brian Granger started DistArray as a NASA-funded SBIR project in 2008.
Enthought picked it up as part of a DOE Phase II SBIR [SBIR]_ to provide a
generally useful distributed array package. It builds on IPython,
IPython.parallel, NumPy, MPI, and interfaces with the Trilinos suite of
distributed HPC solvers (via PyTrilinos [Trilinos]_).
.. [protocol] http://distributed-array-protocol.readthedocs.org/en/rel-0.10.0/
.. [Trilinos] http://trilinos.org/
.. [petsc] http://www.mcs.anl.gov/petsc/
.. [SBIR] http://www.sbir.gov/sbirsearch/detail/410257
DistArray 0.2: development release
Documentation: http://distarray.readthedocs.org
License: Three-clause BSD
Python versions: 2.7 and 3.3
OS support: *nix and Mac OS X
DistArray aims to bring the strengths of NumPy to data-parallel high-performance computing. It provides distributed multi-dimensional NumPy-like arrays and distributed ufuncs, distributed IO capabilities, and can integrate with external distributed libraries, like Trilinos. DistArray works with NumPy and builds on top of it in a flexible and natural way.
Brian Granger started DistArray as a NASA-funded SBIR project in 2008. Enthought picked it up as part of a DOE Phase II SBIR [0] to provide a generally useful distributed array package. It builds on IPython, IPython.parallel, NumPy, MPI, and interfaces with the Trilinos suite of distributed HPC solvers (via PyTrilinos) [1].
Distarray:
- has a client-engine (or master-worker) process design -- data resides on the worker processes, commands are initiated from master;
- allows full control over what is executed on the worker processes and integrates transparently with the master process;
- allows direct communication between workers bypassing the master process for scalability;
- integrates with IPython.parallel for interactive creation and exploration of distributed data;
- supports distributed ufuncs (currently without broadcasting);
- builds on and leverages MPI via MPI4Py in a transparent and user-friendly way;
- supports NumPy-like structured multidimensional arrays;
- has basic support for unstructured arrays;
- supports user-controllable array distributions across workers (block, cyclic, block-cyclic, and unstructured) on a per-axis basis;
- has a straightforward API to control how an array is distributed;
- has basic plotting support for visualization of array distributions;
- separates the array’s distribution from the array’s data -- useful for slicing, reductions, redistribution, broadcasting, all of which will be implemented in coming releases;
- implements distributed random arrays;
- supports
.npy
-like flat-file IO and hdf5 parallel IO (via h5py); leverages MPI-based IO parallelism in an easy-to-use and transparent way; and - supports the distributed array protocol [2], which allows independently developed parallel libraries to share distributed arrays without copying, analogous to the PEP-3118 new buffer protocol.
This is the first public development release. DistArray is not ready for real-world use, but we want to get input from the larger scientific-Python community to help drive its development. The API is changing rapidly and we are adding many new features on a fast timescale. For that reason, DistArray is currently implemented in pure Python for maximal flexibility. Performance improvements are coming.
The 0.2 release's goals are to provide the components necessary to support upcoming features that are non-trivial to implement in a distributed environment.
Planned features for upcoming releases:
- Distributed reductions
- Distributed slicing
- Distributed broadcasting
- Distributed fancy indexing
- Re-distribution methods
- Integration with Trilinos [1] and other packages [3] that subscribe to the distributed array protocol [2]
- Lazy evaluation and deferred computation for latency hiding
- Out-of-core computations
- Extensive examples, tutorials, documentation
- Support for distributed sorting and other non-trivial distributed algorithms
- MPI-only communication for non-interactive deployment on clusters and supercomputers
- End-user control over communication and temporary array creation, and other performance aspects of distributed computations
[0] http://www.sbir.gov/sbirsearch/detail/410257
[1] http://trilinos.org/
[2] http://distributed-array-protocol.readthedocs.org/en/rel-0.10.0/
[3] http://www.mcs.anl.gov/petsc/