diff --git a/CHANGES.SIMD.rst b/CHANGES.SIMD.rst new file mode 100644 index 00000000000..0542b3c7727 --- /dev/null +++ b/CHANGES.SIMD.rst @@ -0,0 +1,89 @@ +Changelog (Pillow-SIMD) +======================= + +3.4.1.post1 +----------- + +- Critical memory error for some combinations of source/destination + sizes is fixed. + +3.4.1.post0 +----------- + +- A lot of optimizations in resampling including 16-bit + intermediate color representation and heavy unrolling. + +3.3.2.post0 +----------- + +- Maintenance release + +3.3.0.post2 +----------- + +- Fixed error in RGBa -> RGBA convertion + +3.3.0.post1 +----------- + +Alpha compositing +~~~~~~~~~~~~~~~~~ + +- SSE4 and AVX2 fixed-point full loading implementation. + Up to 4.6x faster. + +3.3.0.post0 +----------- + +Resampling +~~~~~~~~~~ + +- SSE4 and AVX2 fixed-point full loading horizontal pass. +- SSE4 and AVX2 fixed-point full loading vertical pass. + +Convertion +~~~~~~~~~~ + +- RGBA -> RGBa SSE4 and AVX2 fixed-point full loading implementations. + Up to 2.6x faster. +- RGBa -> RGBA AVX2 implementation using gather instructions. + Up to 5x faster. + + +3.2.0.post3 +----------- + +Resampling +~~~~~~~~~~ + +- SSE4 and AVX2 float full loading horizontal pass. +- SSE4 float full loading vertical pass. + + +3.2.0.post2 +----------- + +Resampling +~~~~~~~~~~ + +- SSE4 and AVX2 float full loading horizontal pass. +- SSE4 float per-pixel loading vertical pass. + + +2.9.0.post1 +----------- + +Resampling +~~~~~~~~~~ + +- SSE4 and AVX2 float per-pixel loading horizontal pass. +- SSE4 float per-pixel loading vertical pass. +- SSE4: Up to 2x for downscaling. Up to 3.5x for upscaling. +- AVX2: Up to 2.7x for downscaling. Up to 3.5x for upscaling. + + +Box blur +~~~~~~~~ + +- Simple SSE4 fixed-point implementations with per-pixel loading. +- Up to 2.1x faster. diff --git a/README.md b/README.md index e7c0ebc5aa5..e37d51f5399 100644 --- a/README.md +++ b/README.md @@ -1,117 +1,177 @@ -

- Pillow logo -

- -# Pillow - -## Python Imaging Library (Fork) - -Pillow is the friendly PIL fork by [Alex Clark and -Contributors](https://github.com/python-pillow/Pillow/graphs/contributors). -PIL is the Python Imaging Library by Fredrik Lundh and Contributors. -As of 2019, Pillow development is -[supported by Tidelift](https://tidelift.com/subscription/pkg/pypi-pillow?utm_source=pypi-pillow&utm_medium=readme&utm_campaign=enterprise). - - - - - - - - - - - - - - - - - - -
docs - Documentation Status -
tests - GitHub Actions build status (Lint) - GitHub Actions build status (Test Linux and macOS) - GitHub Actions build status (Test Windows) - GitHub Actions build status (Test MinGW) - GitHub Actions build status (Test Cygwin) - GitHub Actions build status (Test Docker) - AppVeyor CI build status (Windows) - GitHub Actions wheels build status (Wheels) - Travis CI wheels build status (aarch64) - Code coverage - Tidelift Align -
package - Zenodo - Tidelift - Newest PyPI version - Number of PyPI downloads - OpenSSF Best Practices -
social - Join the chat at https://gitter.im/python-pillow/Pillow - Follow on https://twitter.com/PythonPillow -
- -## Overview - -The Python Imaging Library adds image processing capabilities to your Python interpreter. - -This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities. - -The core image library is designed for fast access to data stored in a few basic pixel formats. It should provide a solid foundation for a general image processing tool. - -## More Information - -- [Documentation](https://pillow.readthedocs.io/) - - [Installation](https://pillow.readthedocs.io/en/latest/installation.html) - - [Handbook](https://pillow.readthedocs.io/en/latest/handbook/index.html) -- [Contribute](https://github.com/python-pillow/Pillow/blob/main/.github/CONTRIBUTING.md) - - [Issues](https://github.com/python-pillow/Pillow/issues) - - [Pull requests](https://github.com/python-pillow/Pillow/pulls) -- [Release notes](https://pillow.readthedocs.io/en/stable/releasenotes/index.html) -- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - - [Pre-fork](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst#pre-fork) - -## Report a Vulnerability - -To report a security vulnerability, please follow the procedure described in the [Tidelift security policy](https://tidelift.com/docs/security). +# Pillow-SIMD + +Pillow-SIMD is "following" Pillow fork (which is PIL fork itself). +"Following" means than Pillow-SIMD versions are 100% compatible +drop-in replacement for Pillow with the same version number. +For example, `Pillow-SIMD 3.2.0.post3` is drop-in replacement for +`Pillow 3.2.0` and `Pillow-SIMD 3.3.3.post0` for `Pillow 3.3.3`. + +For more information about original Pillow, please +[read the documentation][original-docs], +[check the changelog][original-changelog] and +[find out how to contribute][original-contribute]. + + +## Why SIMD + +There are many ways to improve the performance of image processing. +You can use better algorithms for the same task, you can make better +implementation for current algorithms, or you can use more processing unit +resources. It is perfect when you can just use more efficient algorithm like +when gaussian blur based on convolutions [was replaced][gaussian-blur-changes] +by sequential box filters. But a number of such improvements are very limited. +It is also very tempting to use more processor unit resources +(via parallelization) when they are available. But it is handier just +to make things faster on the same resources. And that is where SIMD works better. + +SIMD stands for "single instruction, multiple data". This is a way to perform +same operations against the huge amount of homogeneous data. +Modern CPU have different SIMD instructions sets like +MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON. + +Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) +and AVX2 support. + + +## Status + +[![Uploadcare][uploadcare.logo]][uploadcare.com] + +Pillow-SIMD can be used in production. Pillow-SIMD has been operating on +[Uploadcare][uploadcare.com] servers for more than 1 year. +Uploadcare is SAAS for image storing and processing in the cloud +and the main sponsor of Pillow-SIMD project. + +Currently, following operations are accelerated: + +- Resize (convolution-based resampling): SSE4, AVX2 +- Gaussian and box blur: SSE4 +- Alpha composition: SSE4, AVX2 +- RGBA → RGBa (alpha premultiplication): SSE4, AVX2 +- RGBa → RGBA (division by alpha): AVX2 + +See [CHANGES](CHANGES.SIMD.rst). + + +## Benchmarks + +The numbers in the table represent processed megapixels of source RGB 2560x1600 +image per second. For example, if resize of 2560x1600 image is done +in 0.5 seconds, the result will be 8.2 Mpx/s. + +- Skia 53 +- ImageMagick 6.9.3-8 Q8 x86_64 +- Pillow 3.4.1 +- Pillow-SIMD 3.4.1.post1 + +Operation | Filter | IM | Pillow| SIMD SSE4| SIMD AVX2| Skia 53 +------------------------|---------|------|-------|----------|----------|-------- +**Resize to 16x16** | Bilinear| 41.37| 317.28| 1282.85| 1601.85| 809.49 + | Bicubic | 20.58| 174.85| 712.95| 900.65| 453.10 + | Lanczos | 14.17| 117.58| 438.60| 544.89| 292.57 +**Resize to 320x180** | Bilinear| 29.46| 195.21| 863.40| 1057.81| 592.76 + | Bicubic | 15.75| 118.79| 503.75| 504.76| 327.68 + | Lanczos | 10.80| 79.59| 312.05| 384.92| 196.92 +**Resize to 1920x1200** | Bilinear| 17.80| 68.39| 215.15| 268.29| 192.30 + | Bicubic | 9.99| 49.23| 170.41| 210.62| 112.84 + | Lanczos | 6.95| 37.71| 130.00| 162.57| 104.76 +**Resize to 7712x4352** | Bilinear| 2.54| 8.38| 22.81| 29.17| 20.58 + | Bicubic | 1.60| 6.57| 18.23| 23.94| 16.52 + | Lanczos | 1.09| 5.20| 14.90| 20.40| 12.05 +**Blur** | 1px | 6.60| 16.94| 35.16| | + | 10px | 2.28| 16.94| 35.47| | + | 100px | 0.34| 16.93| 35.53| | + + +### Some conclusion + +Pillow is always faster than ImageMagick. And Pillow-SIMD is faster +than Pillow in 4—5 times. In general, Pillow-SIMD with AVX2 always +**16-40 times faster** than ImageMagick and overperforms Skia, +high-speed graphics library used in Chromium, up to 2 times. + +### Methodology + +All tests were performed on Ubuntu 14.04 64-bit running on +Intel Core i5 4258U with AVX2 CPU on the single thread. + +ImageMagick performance was measured with command-line tool `convert` with +`-verbose` and `-bench` arguments. I use command line because +I need to test the latest version and this is the easiest way to do that. + +All operations produce exactly the same results. +Resizing filters compliance: + +- PIL.Image.BILINEAR == Triangle +- PIL.Image.BICUBIC == Catrom +- PIL.Image.LANCZOS == Lanczos + +In ImageMagick, the radius of gaussian blur is called sigma and the second +parameter is called radius. In fact, there should not be additional parameters +for *gaussian blur*, because if the radius is too small, this is *not* +gaussian blur anymore. And if the radius is big this does not give any +advantages but makes operation slower. For the test, I set the radius +to sigma × 2.5. + +Following script was used for testing: +https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63 + + +## Why Pillow itself is so fast + +There are no cheats. High-quality resize and blur methods are used for all +benchmarks. Results are almost pixel-perfect. The difference is only effective +algorithms. Resampling in Pillow was rewritten in version 2.7 with +minimal usage of floating point numbers, precomputed coefficients and +cache-awareness transposition. This result was improved in 3.3 & 3.4 with +integer-only arithmetics and other optimizations. + + +## Why Pillow-SIMD is even faster + +Because of SIMD, of course. But this is not all. Heavy loops unrolling, +specific instructions, which not available for scalar. + + +## Why do not contribute SIMD to the original Pillow + +Well, that's not simple. First of all, Pillow supports a large number +of architectures, not only x86. But even for x86 platforms, Pillow is often +distributed via precompiled binaries. To integrate SIMD in precompiled binaries +we need to do runtime checks of CPU capabilities. +To compile the code with runtime checks we need to pass `-mavx2` option +to the compiler. But with that option compiller will inject AVX instructions +enev for SSE functions, because every SSE instruction has AVX equivalent. +So there is no easy way to compile such library, especially with setuptools. + + +## Installation + +In general, you need to do `pip install pillow-simd` as always and if you +are using SSE4-capable CPU everything should run smoothly. +Do not forget to remove original Pillow package first. + +If you want the AVX2-enabled version, you need to pass the additional flag to C +compiler. The easiest way to do that is define `CC` variable while compilation. + +```bash +$ pip uninstall pillow +$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd +``` + + +## Contributing to Pillow-SIMD + +Pillow-SIMD and Pillow are two separate projects. +Please submit bugs and improvements not related to SIMD to +[original Pillow][original-issues]. All bugs and fixes in Pillow +will appear in next Pillow-SIMD version automatically. + + + [original-docs]: http://pillow.readthedocs.io/ + [original-issues]: https://github.com/python-pillow/Pillow/issues/new + [original-changelog]: https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst + [original-contribute]: https://github.com/python-pillow/Pillow/blob/master/.github/CONTRIBUTING.md + [gaussian-blur-changes]: http://pillow.readthedocs.io/en/3.2.x/releasenotes/2.7.0.html#gaussian-blur-and-unsharp-mask + [uploadcare.com]: https://uploadcare.com/?utm_source=github&utm_medium=description&utm_campaign=pillow-simd + [uploadcare.logo]: https://ucarecdn.com/dc4b8363-e89f-402f-8ea8-ce606664069c/-/preview/