Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improved performance in various methods in Image and ImageList #879

Merged
merged 15 commits into from
Jul 12, 2024

Conversation

Marsmaennchen221
Copy link
Contributor

@Marsmaennchen221 Marsmaennchen221 commented Jun 29, 2024

Summary of Changes

  1. Improved memory usage and runtime in:
  • Image
    • convert_to_grayscale
    • adjust_brightness
    • add_noise
    • adjust_contrast
    • adjust_color_balance
    • find_edges
  • ImageList
    • from_images
    • add_image
    • add_images
    • remove_image_by_index
    • remove_duplicate_images
    • convert_to_grayscale
    • resize
    • crop
    • adjust_brightness
    • add_noise
    • adjust_contrast
    • adjust_color_balance
    • find_edges
  1. Changed blur algorithm in Image and ImageList from Gaussian blur to box blur
  2. Fixed a bug in blur and sharpen, that they could not work with Tensors of size greater than 2**31

Details to the performance upgrades:

These details will explain the performance upgrades in the ImageList. All performance upgrades and changes in Image are made according to the changes in ImageList

Early stopping

convert_to_grayscale returns self when it has only one channel (in this case, the ImageList is already in grayscale)
remove_duplicate_images returns self if the unique image tensor has the same size as the original (in this case, there are no duplicates)
adjust_brightness returns ImageList with complete black images if factor is 0
adjust_color_balance returns ImageList.convert_to_grayscale if factor is 0

General changes

If a float Tensor is used during the computation, it will be float16 instead of float32
Improved the order of Tensor allocations, so that there will be fewer problems with tensors not being completely on the VRAM

Benchmark

Only the transformation methods have benchmarks for the runtime. Their benchmark includes only changes over/under a change factor of 0.25, rounded to one decimal point, as runtime depends on multiple factors and fluctuates heavily with changes of only a few milliseconds in most cases.
All differences are measured as a factor compared to the original results. That means a factor below 1 is worse, while a factor above 1 is a better result. For readability, all factors equal to 1 are not included.
Due to the bug fix mentioned above for blur and sharpen the performance of these methods decreased in most cases.

Benchmark with RGB images of size 250×250
method result size difference runtime difference max memory allocation during runtime difference
from_images 8 7,95
remove_image_by_index 1,5
adjust_brightness 1,6 1,5
add_noise 4 42,5 2,25
adjust_contrast 1,4 1,69
adjust_color_balance 4 6,3 3
blur 0,7 0,9
sharpen 0,6 0,91
find_edges 1,3
Benchmark with RGBA images (RGB images with transparent layer) of size 256×256
method result size difference runtime difference max memory allocation during runtime difference
from_images 8 6,29
remove_image_by_index 1,5
adjust_brightness 1,7 1,36
add_noise 4 24,5 2,25
adjust_contrast 1,53
adjust_color_balance 4 1,7 2,6
blur 1,6 0,9
sharpen 0,89
Benchmark with RGB images of multiple different sizes
method result size difference runtime difference max memory allocation during runtime difference
from_images 7,7 6,3
add_images 8 2,38
remove_image_by_index 1,28
resize 1,3
crop 1,3 1,09
adjust_brightness 1,3 1,09
add_noise 4 44,3 1,47
adjust_contrast 2 2,17
adjust_color_balance 4 6,7 1,65
blur 2,87
sharpen 0,7 0,9
find_edges 1,9 0,91

…List`

refactor: changed `Image.blur` to use box blur algorithm instead of gaussian blur as box blur is more performant
…age_image_list

# Conflicts:
#	src/safeds/data/labeled/containers/_image_dataset.py
#	tests/safeds/data/labeled/containers/test_image_dataset.py
Copy link
Contributor

github-actions bot commented Jun 29, 2024

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ PYTHON black 7 0 0 1.95s
✅ PYTHON mypy 7 0 3.16s
✅ PYTHON ruff 7 0 0 0.4s
✅ REPOSITORY git_diff yes no 0.68s

See detailed report in MegaLinter reports
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security

Copy link

codecov bot commented Jun 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.71%. Comparing base (b99a760) to head (4d23fe0).
Report is 38 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #879      +/-   ##
==========================================
+ Coverage   97.67%   97.71%   +0.04%     
==========================================
  Files         120      120              
  Lines        6234     6478     +244     
==========================================
+ Hits         6089     6330     +241     
- Misses        145      148       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Marsmaennchen221 Marsmaennchen221 changed the title perf: improved performance over various methods in Image and ImageList perf: improved performance in various methods in Image and ImageList Jun 29, 2024
@lars-reimann
Copy link
Member

Do you have benchmarks to quantify the performance improvement?

Marsmaennchen221 and others added 6 commits July 8, 2024 16:52
…eList.adjust_contrast`

perf: improved performance of `ImageList.crop` if used on a `MultiSizeImageList`
perf: improved performance of `ImageList.convert_to_grayscale` if `ImageList` contains only one channel
…dlib into perf_image_image_list

# Conflicts:
#	src/safeds/data/image/containers/_multi_size_image_list.py
@Marsmaennchen221
Copy link
Contributor Author

Do you have benchmarks to quantify the performance improvement?

@lars-reimann I included the benchmarks in the description of this pr

Copy link
Member

@lars-reimann lars-reimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice performance improvements, great work.

@lars-reimann lars-reimann merged commit 134e7d8 into main Jul 12, 2024
10 checks passed
@lars-reimann lars-reimann deleted the perf_image_image_list branch July 12, 2024 08:51
lars-reimann pushed a commit that referenced this pull request Jul 19, 2024
## [0.27.0](v0.26.0...v0.27.0) (2024-07-19)

### Features

*  join ([#870](#870)) ([5764441](5764441)), closes [#745](#745)
* activation function for forward layer ([#891](#891)) ([5b5bb3f](5b5bb3f)), closes [#889](#889)
* add `ImageDataset.split` ([#846](#846)) ([3878751](3878751)), closes [#831](#831)
* add FunctionalTableTransformer ([#901](#901)) ([37905be](37905be)), closes [#858](#858)
* add InvalidFitDataError ([#824](#824)) ([487854c](487854c)), closes [#655](#655)
* add KNearestNeighborsImputer ([#864](#864)) ([fcdfecf](fcdfecf)), closes [#743](#743)
* add moving average plot ([#836](#836)) ([abcf68a](abcf68a))
* add RobustScaler ([#874](#874)) ([62320a3](62320a3)), closes [#650](#650) [#873](#873)
* add SequentialTableTransformer ([#893](#893)) ([e93299f](e93299f)), closes [#802](#802)
* add temporal operations ([#832](#832)) ([06eab77](06eab77))
* added 'histogram_2d' in TablePlotter  ([#903](#903)) ([4e65ba9](4e65ba9)), closes [#869](#869) [#798](#798)
* added from_str_to_temporal and continues prediction ([#767](#767)) ([35f468a](35f468a)), closes [#806](#806) [#765](#765) [#740](#740) [#773](#773)
* added GRU layer ([#845](#845)) ([d33cb5d](d33cb5d))
* Adds Dropout Layer ([#868](#868)) ([a76f0a1](a76f0a1)), closes [#848](#848)
* dark mode for plots ([#911](#911)) ([5447551](5447551)), closes [#798](#798)
* easily create a baseline model ([#811](#811)) ([8e1b995](8e1b995)), closes [#710](#710)
* get first cell with value other than `None` ([#904](#904)) ([5a0cdb3](5a0cdb3)), closes [#799](#799)
* hyperparameter optimization for fnn models ([#897](#897)) ([c1f66e5](c1f66e5)), closes [#861](#861)
* implement violin plots ([#900](#900)) ([9f5992a](9f5992a)), closes [#867](#867)
* plot decision tree ([#876](#876)) ([d3f81dc](d3f81dc)), closes [#856](#856)
* prediction no longer takes a time series dataset only table ([#838](#838)) ([762e5c2](762e5c2)), closes [#837](#837)
* raise if `remove_colums` is called with unknown column by default ([#852](#852)) ([8f78163](8f78163)), closes [#807](#807)
* regularization strength for logistic classifier ([#866](#866)) ([9f74e92](9f74e92)), closes [#750](#750)
* reorders parameters of RangeScaler and makes them keyword-only ([#847](#847)) ([2b82db7](2b82db7)), closes [#809](#809)
* replace seaborn with matplotlib for box_plot ([#863](#863)) ([4ef078e](4ef078e)), closes [#805](#805) [#849](#849)
* replaced seaborn with matplotlib for correlation_heatmap ([#850](#850)) ([d4680d4](d4680d4)), closes [#800](#800) [#849](#849)

### Bug Fixes

* **deps:** bump urllib3 from 2.2.1 to 2.2.2 ([#842](#842)) ([b81bcd6](b81bcd6)), closes [#3122](https://github.com/Safe-DS/Library/issues/3122) [#3363](https://github.com/Safe-DS/Library/issues/3363) [#3122](https://github.com/Safe-DS/Library/issues/3122) [#3363](https://github.com/Safe-DS/Library/issues/3363) [#3406](https://github.com/Safe-DS/Library/issues/3406) [#3398](https://github.com/Safe-DS/Library/issues/3398) [#3399](https://github.com/Safe-DS/Library/issues/3399) [#3396](https://github.com/Safe-DS/Library/issues/3396) [#3394](https://github.com/Safe-DS/Library/issues/3394) [#3391](https://github.com/Safe-DS/Library/issues/3391) [#3316](https://github.com/Safe-DS/Library/issues/3316) [#3387](https://github.com/Safe-DS/Library/issues/3387) [#3386](https://github.com/Safe-DS/Library/issues/3386)
* labels of correlation heatmap ([#894](#894)) ([a88a609](a88a609)), closes [#871](#871)
* make multi-processing in baseline models more consistent ([#909](#909)) ([fa24560](fa24560)), closes [#907](#907)

### Performance Improvements

* improved performance in various methods in `Image` and `ImageList` ([#879](#879)) ([134e7d8](134e7d8))
@lars-reimann
Copy link
Member

🎉 This PR is included in version 0.27.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released Included in a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants