-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for multithreaded channel extraction #100
Comments
The results of these changes are actually very promising speeding up extraction by about: ~4x For our 8-bit large data use case from 5s -> 1.3s As expected for our test cases with smaller data which does not fill up a chunk fully we are about on par speed wise with our previous implementation. These changes also affect the read/write speeds by about 5-10% (in both directions). Here are the updated averages: Automotive Data 8-bit:
Automotive Data 8-bit Zip:
Glacious Hyundai 8-bit:
Glacious Hyundai 8-bit Zip:
Deep Nested Layers 8-bit:
Automotive Data 16-bit:
Automotive Data 32-bit:
As we can see across the board the changes are minimal except for 32-bit write speeds. However, it appears that 16-bit read/write speeds got slower so I will have to investigate why that might be |
Did some more changes which made channel extraction slower but brought back in line speeds of read/writes and actually improved upon them! ~2x For our 8-bit large data use case from 5s -> 2.3s I'm unsure why 8-bit data is slower in channel extraction compared to 16-bit but it might just be that blosc2 can compress that data better and more efficiently General read/write speedsIgnore the right column for these benchmarks as its just duplicate but these show the new write speeds which speed up: 8-bit: +10% write speeds |
Since we now had our blocks no longer parallelize well on their own I parallelized the channels themselves giving us: ~5x For our 8-bit large data use case from 5s -> 1s This doesnt affect regular read/write speeds as the ImageLayer extraction is only used for that |
For historical reasons the channel extraction was done using a single thread as the rest of the reading/writing pipeline was fully parallelized.
Some recent changes however changed that iterating only over the channels in a specific layer (in most cases around 4). This opens up some additional performance to be had by utilizing thread counts over 4 more optimally.
Initial motivation for this ticket was the following article by the blosc2 team.
This will not only speed up extraction during read/write but also when users want to access the image data of a channel
The text was updated successfully, but these errors were encountered: