Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for multithreaded channel extraction #100

Closed
EmilDohne opened this issue Sep 8, 2024 · 3 comments · Fixed by #101
Closed

Allow for multithreaded channel extraction #100

EmilDohne opened this issue Sep 8, 2024 · 3 comments · Fixed by #101
Assignees
Labels
c++ enhancement New feature or request

Comments

@EmilDohne
Copy link
Owner

For historical reasons the channel extraction was done using a single thread as the rest of the reading/writing pipeline was fully parallelized.

Some recent changes however changed that iterating only over the channels in a specific layer (in most cases around 4). This opens up some additional performance to be had by utilizing thread counts over 4 more optimally.

Initial motivation for this ticket was the following article by the blosc2 team.

This will not only speed up extraction during read/write but also when users want to access the image data of a channel

@EmilDohne EmilDohne added enhancement New feature or request c++ labels Sep 8, 2024
@EmilDohne EmilDohne self-assigned this Sep 8, 2024
@EmilDohne
Copy link
Owner Author

The results of these changes are actually very promising speeding up extraction by about:

~4x For our 8-bit large data use case from 5s -> 1.3s
~2x For our 16-bit large data use case from 3.5s -> 1.95s
~2x For our 32-bit large data use case from 7s -> 3.8s

As expected for our test cases with smaller data which does not fill up a chunk fully we are about on par speed wise with our previous implementation.

These changes also affect the read/write speeds by about 5-10% (in both directions).

Here are the updated averages:

Automotive Data 8-bit:

  • Read: 1.04s -> 1.14s
  • Write: 2.0s -> 1.92s

Automotive Data 8-bit Zip:

  • Read: 1.02s -> 1.09s
  • Write: 2.28s -> 2.13s

Glacious Hyundai 8-bit:

  • Read: 0.54s -> 0.59s
  • Write: 0.97s -> 1.01s

Glacious Hyundai 8-bit Zip:

  • Read: 0.75s -> 0.61s
  • Write: 1.37s -> 1.38s

Deep Nested Layers 8-bit:

  • Read: 0.40s -> 0.39s
  • Write: 0.71s -> 0.67s

Automotive Data 16-bit:

  • Read: 3.79s -> 3.96s
  • Write: 6.23s -> 6.99s

Automotive Data 32-bit:

  • Read: 13.54s -> 13.55s
  • Write: 14.48s -> 13.50ss

As we can see across the board the changes are minimal except for 32-bit write speeds. However, it appears that 16-bit read/write speeds got slower so I will have to investigate why that might be

@EmilDohne
Copy link
Owner Author

Did some more changes which made channel extraction slower but brought back in line speeds of read/writes and actually improved upon them!

~2x For our 8-bit large data use case from 5s -> 2.3s
~2x For our 16-bit large data use case from 3.5s -> 1.95s
~2x For our 32-bit large data use case from 7s -> 3.65s

I'm unsure why 8-bit data is slower in channel extraction compared to 16-bit but it might just be that blosc2 can compress that data better and more efficiently

General read/write speeds

Ignore the right column for these benchmarks as its just duplicate but these show the new write speeds which speed up:

8-bit: +10% write speeds
16-bit: +10% write speeds
32-bit +10% write speeds

8-bit_graphs
16-bit_graphs
32-bit_graphs

@EmilDohne
Copy link
Owner Author

Since we now had our blocks no longer parallelize well on their own I parallelized the channels themselves giving us:

~5x For our 8-bit large data use case from 5s -> 1s
~2.2x For our 16-bit large data use case from 3.5s -> 1.6s
~2.2x For our 32-bit large data use case from 7s -> 3.2s

This doesnt affect regular read/write speeds as the ImageLayer extraction is only used for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant