add Zeiss CZI format #396

iewchen · 2022-10-11T20:19:19Z

This driver add support for CZI format generated by Zeiss microscope.

CZI format stores whole slide in many smaller tiles, or subblocks in CZI's term. The size of these tiles can exceed 2000x2000 pixels. Each tile has a associated directory entry, which describes its location, level 0 size, real tile size, the channel it was taken etc. A CZI file can be pyramid or non-pyramid. Openslide can read non-pyramid CZI, albeit much slower than the same slide in pyramid format.

A CZI file can embed other files, such as CZI file or JPG. CZI call them attachments. This driver reads three of them: SlidePreview attachment as macro image in openslide associated images, Label attachment as label, and Thumbnail attachment as thumbnail.

CZI stores image tile in JPEG XR or uncompressed. One can save CZI as uncompressed, which is simply stream of pixel bytes. The size is more than ten times larger than its JPEG XR encoded counterpart. The SlidePreview attachment somehow is stored in uncompressed format. CZI may use JPEG, LZW or ZSTD, however, none of files I saw uses any of them, therefor these decoders are not included. Images pixel can be:

BGR24(8bits per RGB color): used by bright field
BGR48(16 bits per RGB color): SlidePreview is BGR48 uncompressed
GRAY16: 16 bits gray image, used by fluorescence and TIE
GRAY8: Zeiss may have an option to generate 8 bits gray image but I haven't tested it.

This driver convert BGR48 and GRAY16 into 8 bits per color (or channel) by keeping the most significant 8 bits. Zeiss may use 12 or 14 bits in GRAY16, this driver reads the effective pixel bits from XML metadata and convert pixels accordingly.

At most three Gray channels can be combined into a pseudo ARGB image, the alpha channel is unused. This driver follows fluorescence microscopy convention when combine gray channels, i.e. the first gray channel to blue color, second gray channel to green, third to red.

After detect samples on a slide, Zeiss captures each sample as separated scene. Because each image tile has a start x and y, openslide can show these multi-scenes whole slide even without knowing which scene a tile belongs. Nevertheless, this driver records the scene id when read the subblock directory entry.

The JPEG XR decoder is from jxrlib. It is included in CentOS, Debian and Ubuntu. Because CentOS7, Debian 10 and 11, Ubuntu < 22 are all missing pkg-config file, an autogen.sh script is included to generate libjxr.pc for configure. jxrlib may be unavailable on some platforms. The configure script generates Makefile based on presence of jxrlib. It only builds JPEG XR decoder and zeiss driver when jxrlib is found.

github-actions · 2022-10-11T20:20:06Z

DCO signed off ✔️

All commits have been signed off. You have certified to the terms of the Developer Certificate of Origin, version 1.1. In particular, you certify that this contribution has not been developed using information obtained under a non-disclosure agreement or other license terms that forbid you from contributing it under the GNU Lesser General Public License, version 2.1.

bgilbert

Thanks for the PR! This is a large one, and will take multiple rounds of detailed review before it's ready to merge. You may want to consider removing the JXR support from this PR to keep things simple, and then adding JXR in a followup PR once the initial code lands. It's okay to stick with combining them into one PR if you'd like, but note that larger PRs are harder to land.

Also, OpenSlide doesn't currently have any drivers that pack fluorescence data into ARGB. We may eventually define new API for fluorescence support (#42) but for now I think I'd prefer not to add special-case behavior for a single vendor driver. If you'd like, you can move that code to a followup PR as well, though I can't promise it'll merge.

I've done an initial pass to point out some structural and formatting things I noticed, but haven't yet tried to read the code in detail. As a next step, please go through your code and systematically address those comments. In particular, its error handling and memory management will need to be adapted to consistently use OpenSlide's conventions. Please post a comment in the PR when it's ready for another round of review.

Can you provide some sample slides (at least one each of uncompressed and JXR) that we can redistribute as part of our test suite? I'd strongly prefer not to merge code this complex without test cases.

Makefile.am

configure.ac

src/openslide-decode-jxr.h

src/openslide-vendor-zeiss.c

autogen.sh

src/openslide-decode-jxr.c

bgilbert · 2022-10-12T03:19:37Z

For the record, there's additional context about format documentation in this thread.

iewchen · 2022-10-12T19:43:37Z

Thank you for the review!

You may want to consider removing the JXR support from this PR to keep things simple, and then adding JXR in a followup PR once the initial code lands. It's okay to stick with combining them into one PR if you'd like, but note that larger PRs are harder to land.

Can we keep JXR? Because CZI is not practical to use without JXR. As you can see in the submitted slide examples, a 66MB slide with JXR grows beyond 1GB if save as uncompressed.

Also, OpenSlide doesn't currently have any drivers that pack fluorescence data into ARGB. We may eventually define new API for fluorescence support (#42) but for now I think I'd prefer not to add special-case behavior for a single vendor driver. If you'd like, you can move that code to a followup PR as well, though I can't promise it'll merge.

I updated the PR to remove processing grayscale image. A side effect is it no longer reads the Plate1-Blue-A-xx.czi in openslide-testdata.

I've done an initial pass to point out some structural and formatting things I noticed, but haven't yet tried to read the code in detail. As a next step, please go through your code and systematically address those comments. In particular, its error handling and memory management will need to be adapted to consistently use OpenSlide's conventions. Please post a comment in the PR when it's ready for another round of review.

Can you provide some sample slides (at least one each of uncompressed and JXR) that we can redistribute as part of our test suite? I'd strongly prefer not to merge code this complex without test cases.

I uploaded few slides, their name starts with 10x_two_scenes. The uploader's name may be different because it requires google account.

bgilbert · 2022-10-12T22:18:47Z

Can we keep JXR? Because CZI is not practical to use without JXR. As you can see in the submitted slide examples, a 66MB slide with JXR grows beyond 1GB if save as uncompressed.

Yes, that's the plan. I'm just saying that you could split the PR into two, to make the reviews easier. I'm okay either way though.

I uploaded few slides, their name starts with 10x_two_scenes. The uploader's name may be different because it requires google account.

Yes, but in the last question on the first page of the form, you didn't give us permission to redistribute the samples. Without that permission, we can't add the samples to openslide-testdata and can't use them in test cases.

iewchen · 2022-10-12T22:58:28Z

Yes, but in the last question on the first page of the form, you didn't give us permission to redistribute the samples. Without that permission, we can't add the samples to openslide-testdata and can't use them in test cases.

I re-submitted the same set of slides, this time with permission to redistribute.

bgilbert · 2022-10-13T01:53:55Z

Great, thanks. I'll try to get those uploaded within the next few days, and then you'll be able to use them to create test cases.

bgilbert · 2022-11-12T22:03:09Z

I've now uploaded the samples here as Zeiss-5-*. Please add test cases when you get a chance.

How are the changes going? Remember that the code will need to be converted to GError error handling before I can do another round of review.

iewchen · 2022-11-17T00:12:15Z

How are the changes going? Remember that the code will need to be converted to GError error handling before I can do another round of review.

I have added 'GError' error handling.

The newly added test cases failed to pass check because the build machine does not have libjxr. Is there a way to conditionally bypass zeiss driver test?

bgilbert · 2022-12-03T19:58:42Z

I have added 'GError' error handling.

I'm still seeing a number of places where functions are returning false without setting an error, where g_warning() and similar functions are used for error reporting, and where functions are being called with NULL GError ** arguments.

The newly added test cases failed to pass check because the build machine does not have libjxr. Is there a way to conditionally bypass zeiss driver test?

#407 is the right way to handle this. Thanks for submitting it.

bgilbert · 2022-12-03T20:12:03Z

The newly added test cases failed to pass check because the build machine does not have libjxr. Is there a way to conditionally bypass zeiss driver test?

Oh, sorry, I didn't read this properly. Yes, for optional libraries, you'll need to set FEATURE_FLAGS similar to this and requires similar to this.

NicoKiaru · 2024-01-24T13:53:10Z

Just FYI, I gathered a list of publicly available CZI files here if you want to test potential issues while reading czi format.

AdamBajger · 2024-02-01T14:41:42Z

Is this still in active development, or abandoned?

iewchen · 2024-02-01T15:56:34Z

It's been a while since I last looked at it. The last pull request was to add JPEG XR to GitHub CI, and it somehow got stuck. Two labs are using this patch, including support for fluorescence, and it works as expected. So if anyone needs to work with Zeiss scanner, they can build it independently.

This driver add support for CZI format generated by Zeiss microscope. CZI format stores whole slide in many smaller tiles, or subblocks in CZI's term. The size of these tiles can exceed 2000x2000 pixels. Each tile has a associated directory entry, which describes its location, level 0 size, real tile size, the channel it was taken etc. A CZI file can be pyramid or non-pyramid. Openslide can read non-pyramid CZI, albeit much slower than the same slide in pyramid format. A CZI file can embed other files, such as CZI file or JPG. CZI call them attachments. This driver reads three of them: SlidePreview attachment as macro image in openslide associated images, Label attachment as label, and Thumbnail attachment as thumbnail. CZI stores image tile in JPEG XR or uncompressed. One can save CZI as uncompressed, which is simply stream of pixel bytes. The size is more than ten times larger than its JPEG XR encoded counterpart. The SlidePreview attachment somehow is stored in uncompressed format. CZI may use JPEG, LZW or ZSTD, however, none of files I saw uses any of them, therefor these decoders are not included. Images pixel can be: - BGR24(8bits per RGB color): used by bright field - BGR48(16 bits per RGB color): SlidePreview is BGR48 uncompressed - GRAY16: 16 bits gray image, used by fluorescence and TIE - GRAY8: Zeiss may have an option to generate 8 bits gray image but I haven't tested it. This driver convert BGR48 and GRAY16 into 8 bits per color (or channel) by keeping the most significant 8 bits. Zeiss may use 12 or 14 bits in GRAY16, this driver reads the effective pixel bits from XML metadata and convert pixels accordingly. At most three Gray channels can be combined into a pseudo ARGB image, the alpha channel is unused. This driver follows fluorescence microscopy convention when combine gray channels, i.e. the first gray channel to blue color, second gray channel to green, third to red. After detect samples on a slide, Zeiss captures each sample as separated scene. Because each image tile has a start x and y, openslide can show these multi-scenes whole slide even without knowing which scene a tile belongs. Nevertheless, this driver records the scene id when read the subblock directory entry. The JPEG XR decoder is from jxrlib. It is included in CentOS, Debian and Ubuntu. Because CentOS7, Debian 10 and 11, Ubuntu < 22 are all missing pkg-config file, an autogen.sh script is included to generate libjxr.pc for configure. jxrlib may be unavailable on some platforms. The configure script generates Makefile based on presence of jxrlib. It only builds JPEG XR decoder and zeiss driver when jxrlib is found. Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

With range properties, user can skip empty area on a multi-scenes slide. Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

Deepzoom may miss some scenes at max zoom out if scenes on a slide have different pyramid levels. Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

Input of quickhash-1 is the first 544 bytes of CZI file. It is the file header, which has primary file GUID, file GUID, file part number, attachment directory position, metadata position, subblock directory position etc.al. The file header is quite unique. Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

- test openslide_open(). - test openslide_read_region() on both scenes and on highest and lowest resolution. - test quickhash-1. Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

They're only used on the bottom level, where they're equal to tw/th. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Prep for next commit. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Call it directly from create_czi(), rather than mutating the subblocks after create_czi() returns them. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Always g_strndup() fixed-length strings from on-disk structures to avoid potential read overruns when including those strings in error messages. In particular, do this for att.name even though we currently don't reference it from a format string. Exclude magic checks from this rule. check_magic() already does safe in-place string comparison and it's called from many places. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Add XML attributes and text nodes from the AttachmentInfos, DisplaySetting, Information, and Scaling elements of ImageDocument.Metadata. The Experiment and HardwareSetting elements are too verbose, and pertain more to the environment and its configuration than to the slide itself. The CustomAttributes element doesn't seem all that useful, and contains multi-line matrix values that aren't a good fit for OpenSlide properties. Omit "ImageDocument.Metadata" from property names, since it would make the names longer without disambiguating anything. All XML metadata elements are under ImageDocument.Metadata. Detect lists of identically-named elements using an heuristic and one special case; rename these elements using a unique identifier selected from their attributes by another heuristic. If we can't find a unique identifier, or if we find any identically-named elements after doing the renames, don't add properties for the offending elements. We can modify the heuristics if problems arise, and this avoids properties from multiple elements clobbering each other, or properties being added under unsupportable names. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

g_ascii_strto[u]ll is hard to use correctly: callers have to clear errno beforehand, check it afterward, and also pass and check endptr. Add wrappers to handle all that. While _openslide_parse_double() can return NAN on parse failure, integer parsers don't have a nice sentinel value, so return a boolean and write the parsed value through an out-argument. Currently all callers that want signed also want base 10, but one caller wants unsigned base 16, and alternate bases make more sense for unsigned anyway. Support alternate bases for unsigned but not for signed. Hamamatsu VMS depended on the ability to ignore trailing garbage, so we now need some extra code to explicitly reject the garbage. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Read standard properties and metadata values from vendor properties. This is documented to be the preferred approach, since it ensures the underlying raw data is available from properties. Don't assume that the first objective in the Objectives list is the relevant one; dereference Information.Image.ObjectiveSettings.ObjectiveRef instead. Drop the XML XPath helpers we added. We don't use them anymore, and they're probably only useful for parsing values directly out of XML metadata, which we want to discourage. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

For openslide/openslide#396. Co-authored-by: Wei Chen <chenw1@uthscsa.edu> Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

bgilbert · 2024-05-12T20:16:05Z

Prep commits cherry-picked into #597.

Make it clearer that they're byte arrays, not strings. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Other dimensions might plausibly be missing without loss of functionality, but subblocks need at least a width and a height. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Don't do it as a side-effect of create_levels(). Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

bgilbert · 2024-05-13T04:50:43Z

I think this is ready to go in. @iewchen, can you give it a final check?

For openslide/openslide#396. Co-authored-by: Wei Chen <chenw1@uthscsa.edu> Signed-off-by: Wei Chen <chenw1@uthscsa.edu> Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

iewchen · 2024-05-13T21:28:28Z

I think this is ready to go in. @iewchen, can you give it a final check?

Great! I can confirm it works. And squash merge sounds good.

bgilbert · 2024-05-13T23:21:25Z

Thanks for the driver! And thanks for your patience through the long review/revision process. Now that the basic structure is in place, additional functionality should hopefully be easier to land. I'd suggest Zstandard next, since JXR has external dependencies which will need some additional work. I'm still interested in SIMD as well, though that may also be a more involved process.

For openslide/openslide#396. Co-authored-by: Wei Chen <chenw1@uthscsa.edu> Signed-off-by: Wei Chen <chenw1@uthscsa.edu> Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

iewchen force-pushed the zeiss-czi branch from d2e9ebd to 1b27650 Compare October 11, 2022 20:26

bgilbert mentioned this pull request Oct 12, 2022

Support Zeiss CZI #144

Open

bgilbert reviewed Oct 12, 2022

View reviewed changes

iewchen added 14 commits February 20, 2024 10:26

remove support for grayscale image

bd5fc37

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

fix spelling and clarify JXR decoding buffer structure name

c54da7d

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

improve error check, stop using Call macro from jxrlib

c469359

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

change brace style

9f096cb

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

improve error handling

53864c8

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

remove forward declaration of destroy() and paint_region()

2cfdf33

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

correct spelling, cario to cairo

c946e29

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

Set range properties

24ce5b5

With range properties, user can skip empty area on a multi-scenes slide. Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

Skip pyramid levels not available on all scenes

a41e30b

Deepzoom may miss some scenes at max zoom out if scenes on a slide have different pyramid levels. Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

check ptr array is allocated before g_ptr_array_free()

b4605ce

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

add zeiss test cases

4dbc8c3

- test openslide_open(). - test openslide_read_region() on both scenes and on highest and lowest resolution. - test quickhash-1. Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

change to meson build system

96de5c3

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>

bgilbert added 9 commits May 12, 2024 13:15

zeiss: drop unused subblock x2/y2 values

8336201

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: drop redundant subblock w/h values

a432613

They're only used on the bottom level, where they're equal to tw/th. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: rename subblock tw/th to w/h

711f988

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: move adjust_coordinate_origin() within source file

7876c86

Prep for next commit. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: adjust_coordinate_origin() cleanups

c3c48dc

Call it directly from create_czi(), rather than mutating the subblocks after create_czi() returns them. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

bgilbert added a commit to bgilbert/openslide.github.io that referenced this pull request May 12, 2024

Document Zeiss CZI

3268b0e

For openslide/openslide#396. Co-authored-by: Wei Chen <chenw1@uthscsa.edu> Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

This was referenced May 12, 2024

Document Zeiss CZI openslide/openslide.github.io#86

Merged

Infrastructure improvements for Zeiss CZI #597

Merged

bgilbert added 10 commits May 12, 2024 22:54

Merge branch 'main' into zeiss-czi

982c3fb

zeiss: make GUIDs uint8_t[]

2827e24

Make it clearer that they're byte arrays, not strings. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: drop unused struct field

d7ef12c

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: require subblocks to have at least X and Y dimensions

5ea7724

Other dimensions might plausibly be missing without loss of functionality, but subblocks need at least a width and a height. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: start cache during final commit phase

6a70332

Don't do it as a side-effect of create_levels(). Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: avoid unnecessary ftell

7e0db8d

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: error message cleanups

2cce4bb

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

zeiss: minor cleanups

871d357

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

README: link to Zeiss format documentation

1d1ac6c

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

tests: update frozen archive

a1eb371

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>

bgilbert merged commit 45dd214 into openslide:main May 13, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Zeiss CZI format #396

add Zeiss CZI format #396

iewchen commented Oct 11, 2022

github-actions bot commented Oct 11, 2022 •

edited

Loading

bgilbert left a comment

bgilbert commented Oct 12, 2022

iewchen commented Oct 12, 2022

bgilbert commented Oct 12, 2022

iewchen commented Oct 12, 2022

bgilbert commented Oct 13, 2022

bgilbert commented Nov 12, 2022

iewchen commented Nov 17, 2022

bgilbert commented Dec 3, 2022

bgilbert commented Dec 3, 2022

NicoKiaru commented Jan 24, 2024 •

edited

Loading

AdamBajger commented Feb 1, 2024

iewchen commented Feb 1, 2024

bgilbert commented May 12, 2024

bgilbert commented May 13, 2024

iewchen commented May 13, 2024

bgilbert commented May 13, 2024

add Zeiss CZI format #396

add Zeiss CZI format #396

Conversation

iewchen commented Oct 11, 2022

github-actions bot commented Oct 11, 2022 • edited Loading

DCO signed off ✔️

bgilbert left a comment

Choose a reason for hiding this comment

bgilbert commented Oct 12, 2022

iewchen commented Oct 12, 2022

bgilbert commented Oct 12, 2022

iewchen commented Oct 12, 2022

bgilbert commented Oct 13, 2022

bgilbert commented Nov 12, 2022

iewchen commented Nov 17, 2022

bgilbert commented Dec 3, 2022

bgilbert commented Dec 3, 2022

NicoKiaru commented Jan 24, 2024 • edited Loading

AdamBajger commented Feb 1, 2024

iewchen commented Feb 1, 2024

bgilbert commented May 12, 2024

bgilbert commented May 13, 2024

iewchen commented May 13, 2024

bgilbert commented May 13, 2024

github-actions bot commented Oct 11, 2022 •

edited

Loading

NicoKiaru commented Jan 24, 2024 •

edited

Loading