Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast image allocation #2655

Merged
merged 4 commits into from
Aug 16, 2017
Merged

Fast image allocation #2655

merged 4 commits into from
Aug 16, 2017

Conversation

homm
Copy link
Member

@homm homm commented Aug 6, 2017

The Pillow's ImagingNew function always creates images filled with zeroes (black color). While this is fine for some operations, for others this is absolutely not required and can affect performance, especially for operations which does not required intensive computation.

Zeroing is required if:

  • operation depends on initial pixels values and expects zeroes
  • operation can be applied on the part of the image
  • operation can be applied partially, for example loading corrupted image

In other cases zeroing can be avoided.

This PR provides new ImagingNewDirty function which makes no warranties about the content of created images. Also, this PR uses ImagingNewDirty for many operations, such as resize, crop.

@homm
Copy link
Member Author

homm commented Aug 6, 2017

So, there are benchmarks. I've tested preview of Pillow-SIMD 4.3 on desktop i5-4430 on not virtualized Ubuntu 16.04.

There are three columns: concurrency 1, 2 and 4.

Commands to run:

$ ../pillow-perf/testsuite/run.py scale crop allocate convert -n 21

$ (../pillow-perf/testsuite/run.py scale crop allocate convert -n 21
   & ../pillow-perf/testsuite/run.py scale crop allocate convert -n 21)

$ (../pillow-perf/testsuite/run.py scale crop allocate convert -n 21
   & ../pillow-perf/testsuite/run.py scale crop allocate convert -n 21
   & ../pillow-perf/testsuite/run.py scale crop allocate convert -n 21
   & ../pillow-perf/testsuite/run.py scale crop allocate convert -n 21)

Each odd line (darker background) is the old code, each even line (white background) is the new. In each test the new code is faster, up to 2 times. Also, for operations which are not limited to memory bandwidth (resizing), performance degradation is most noticeable for the old code.

For example:
before: to 2048x1280 lzs on concurrency 4 degrades in 1.45 times
after: to 2048x1280 lzs on concurrency 4 degrades in 1.13 times
before: to 5478x3424 bil on concurrency 4 degrades in 1.56 times
after: to 5478x3424 bil on concurrency 4 degrades in 1.22 times

conc. 1 conc. 2 conc. 4
Scale 2560×1600 RGB image
    to 2048x1280 bil before 340.68 Mpx/s 270.17 Mpx/s 155.88 Mpx/s
                          after 420.19 Mpx/s 387.11 Mpx/s 254.74 Mpx/s
    to 2048x1280 bic 273.29 Mpx/s 238.47 Mpx/s 155.75 Mpx/s
325.72 Mpx/s 320.75 Mpx/s 245.83 Mpx/s
    to 2048x1280 lzs 212.49 Mpx/s 202.35 Mpx/s 146.63 Mpx/s
245.11 Mpx/s 241.67 Mpx/s 216.28 Mpx/s
    to 5478x3424 bil 70.36 Mpx/s 62.94 Mpx/s 45.03 Mpx/s
76.03 Mpx/s 73.74 Mpx/s 62.37 Mpx/s
    to 5478x3424 bic 60.20 Mpx/s 57.15 Mpx/s 43.46 Mpx/s
64.48 Mpx/s 63.07 Mpx/s 55.41 Mpx/s
    to 5478x3424 lzs 48.56 Mpx/s 46.68 Mpx/s 40.22 Mpx/s
51.02 Mpx/s 50.05 Mpx/s 45.36 Mpx/s
Crop 2560×1600 RGB image
    2304x1440 before 1335.08 Mpx/s 735.25 Mpx/s 427.34 Mpx/s
                          after 2405.13 Mpx/s 1399.92 Mpx/s 863.79 Mpx/s
    2816x1760 821.65 Mpx/s 439.39 Mpx/s 213.60 Mpx/s
1178.72 Mpx/s 676.13 Mpx/s 334.77 Mpx/s
Allocate 2560×1600 RGB image
    mode L before 11317.44 Mpx/s 4095.32 Mpx/s 1550.39 Mpx/s
                          after 22025.47 Mpx/s 6941.36 Mpx/s 2882.53 Mpx/s
    mode LA 1713.02 Mpx/s 864.66 Mpx/s 413.73 Mpx/s
5541.89 Mpx/s 2630.51 Mpx/s 1268.92 Mpx/s
    mode RGB 1722.47 Mpx/s 893.53 Mpx/s 408.22 Mpx/s
5558.03 Mpx/s 3061.27 Mpx/s 1326.43 Mpx/s
    mode RGBA 1715.93 Mpx/s 892.97 Mpx/s 408.99 Mpx/s
5572.45 Mpx/s 3049.86 Mpx/s 1332.39 Mpx/s
Convert 2560×1600 RGB image
    RGB to L before 658.94 Mpx/s 638.49 Mpx/s 420.15 Mpx/s
                          after 692.74 Mpx/s 680.85 Mpx/s 629.48 Mpx/s
    RGBA to LA 464.45 Mpx/s 412.19 Mpx/s 250.98 Mpx/s
568.89 Mpx/s 562.02 Mpx/s 404.35 Mpx/s
    RGBa to RGBA 869.12 Mpx/s 479.30 Mpx/s 253.37 Mpx/s
1451.37 Mpx/s 814.17 Mpx/s 422.22 Mpx/s
    RGBA to RGBa 916.94 Mpx/s 486.46 Mpx/s 252.86 Mpx/s
1483.58 Mpx/s 807.89 Mpx/s 413.37 Mpx/s

@homm homm removed the Do Not Merge label Aug 6, 2017
@homm homm added this to the 4.3.0 milestone Aug 9, 2017
@homm homm changed the base branch from storage-cleanup to master August 16, 2017 11:04
@homm
Copy link
Member Author

homm commented Aug 16, 2017

Changed base to master after merging #2654

@wiredfool wiredfool merged commit 9c4535b into master Aug 16, 2017
@hugovk hugovk deleted the fast-allocation branch August 16, 2017 14:21
@homm homm mentioned this pull request Aug 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants