Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Zeiss CZI format #396

Merged
merged 131 commits into from
May 13, 2024
Merged

add Zeiss CZI format #396

merged 131 commits into from
May 13, 2024

Conversation

iewchen
Copy link
Contributor

@iewchen iewchen commented Oct 11, 2022

This driver add support for CZI format generated by Zeiss microscope.

CZI format stores whole slide in many smaller tiles, or subblocks in CZI's term. The size of these tiles can exceed 2000x2000 pixels. Each tile has a associated directory entry, which describes its location, level 0 size, real tile size, the channel it was taken etc. A CZI file can be pyramid or non-pyramid. Openslide can read non-pyramid CZI, albeit much slower than the same slide in pyramid format.

A CZI file can embed other files, such as CZI file or JPG. CZI call them attachments. This driver reads three of them: SlidePreview attachment as macro image in openslide associated images, Label attachment as label, and Thumbnail attachment as thumbnail.

CZI stores image tile in JPEG XR or uncompressed. One can save CZI as uncompressed, which is simply stream of pixel bytes. The size is more than ten times larger than its JPEG XR encoded counterpart. The SlidePreview attachment somehow is stored in uncompressed format. CZI may use JPEG, LZW or ZSTD, however, none of files I saw uses any of them, therefor these decoders are not included. Images pixel can be:

  • BGR24(8bits per RGB color): used by bright field

  • BGR48(16 bits per RGB color): SlidePreview is BGR48 uncompressed

  • GRAY16: 16 bits gray image, used by fluorescence and TIE

  • GRAY8: Zeiss may have an option to generate 8 bits gray image but I haven't tested it.

This driver convert BGR48 and GRAY16 into 8 bits per color (or channel) by keeping the most significant 8 bits. Zeiss may use 12 or 14 bits in GRAY16, this driver reads the effective pixel bits from XML metadata and convert pixels accordingly.

At most three Gray channels can be combined into a pseudo ARGB image, the alpha channel is unused. This driver follows fluorescence microscopy convention when combine gray channels, i.e. the first gray channel to blue color, second gray channel to green, third to red.

After detect samples on a slide, Zeiss captures each sample as separated scene. Because each image tile has a start x and y, openslide can show these multi-scenes whole slide even without knowing which scene a tile belongs. Nevertheless, this driver records the scene id when read the subblock directory entry.

The JPEG XR decoder is from jxrlib. It is included in CentOS, Debian and Ubuntu. Because CentOS7, Debian 10 and 11, Ubuntu < 22 are all missing pkg-config file, an autogen.sh script is included to generate libjxr.pc for configure. jxrlib may be unavailable on some platforms. The configure script generates Makefile based on presence of jxrlib. It only builds JPEG XR decoder and zeiss driver when jxrlib is found.

@github-actions
Copy link

github-actions bot commented Oct 11, 2022

DCO signed off ✔️

All commits have been signed off. You have certified to the terms of the Developer Certificate of Origin, version 1.1. In particular, you certify that this contribution has not been developed using information obtained under a non-disclosure agreement or other license terms that forbid you from contributing it under the GNU Lesser General Public License, version 2.1.

Copy link
Member

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is a large one, and will take multiple rounds of detailed review before it's ready to merge. You may want to consider removing the JXR support from this PR to keep things simple, and then adding JXR in a followup PR once the initial code lands. It's okay to stick with combining them into one PR if you'd like, but note that larger PRs are harder to land.

Also, OpenSlide doesn't currently have any drivers that pack fluorescence data into ARGB. We may eventually define new API for fluorescence support (#42) but for now I think I'd prefer not to add special-case behavior for a single vendor driver. If you'd like, you can move that code to a followup PR as well, though I can't promise it'll merge.

I've done an initial pass to point out some structural and formatting things I noticed, but haven't yet tried to read the code in detail. As a next step, please go through your code and systematically address those comments. In particular, its error handling and memory management will need to be adapted to consistently use OpenSlide's conventions. Please post a comment in the PR when it's ready for another round of review.

Can you provide some sample slides (at least one each of uncompressed and JXR) that we can redistribute as part of our test suite? I'd strongly prefer not to merge code this complex without test cases.

Makefile.am Outdated Show resolved Hide resolved
Makefile.am Outdated Show resolved Hide resolved
configure.ac Outdated Show resolved Hide resolved
src/openslide-decode-jxr.h Outdated Show resolved Hide resolved
src/openslide-decode-jxr.h Outdated Show resolved Hide resolved
src/openslide-vendor-zeiss.c Outdated Show resolved Hide resolved
src/openslide-vendor-zeiss.c Outdated Show resolved Hide resolved
autogen.sh Outdated Show resolved Hide resolved
src/openslide-decode-jxr.c Outdated Show resolved Hide resolved
src/openslide-decode-jxr.c Outdated Show resolved Hide resolved
@bgilbert
Copy link
Member

For the record, there's additional context about format documentation in this thread.

@iewchen
Copy link
Contributor Author

iewchen commented Oct 12, 2022

Thank you for the review!

You may want to consider removing the JXR support from this PR to keep things simple, and then adding JXR in a followup PR once the initial code lands. It's okay to stick with combining them into one PR if you'd like, but note that larger PRs are harder to land.

Can we keep JXR? Because CZI is not practical to use without JXR. As you can see in the submitted slide examples, a 66MB slide with JXR grows beyond 1GB if save as uncompressed.

Also, OpenSlide doesn't currently have any drivers that pack fluorescence data into ARGB. We may eventually define new API for fluorescence support (#42) but for now I think I'd prefer not to add special-case behavior for a single vendor driver. If you'd like, you can move that code to a followup PR as well, though I can't promise it'll merge.

I updated the PR to remove processing grayscale image. A side effect is it no longer reads the Plate1-Blue-A-xx.czi in openslide-testdata.

I've done an initial pass to point out some structural and formatting things I noticed, but haven't yet tried to read the code in detail. As a next step, please go through your code and systematically address those comments. In particular, its error handling and memory management will need to be adapted to consistently use OpenSlide's conventions. Please post a comment in the PR when it's ready for another round of review.

Can you provide some sample slides (at least one each of uncompressed and JXR) that we can redistribute as part of our test suite? I'd strongly prefer not to merge code this complex without test cases.

I uploaded few slides, their name starts with 10x_two_scenes. The uploader's name may be different because it requires google account.

@bgilbert
Copy link
Member

Can we keep JXR? Because CZI is not practical to use without JXR. As you can see in the submitted slide examples, a 66MB slide with JXR grows beyond 1GB if save as uncompressed.

Yes, that's the plan. I'm just saying that you could split the PR into two, to make the reviews easier. I'm okay either way though.

I uploaded few slides, their name starts with 10x_two_scenes. The uploader's name may be different because it requires google account.

Yes, but in the last question on the first page of the form, you didn't give us permission to redistribute the samples. Without that permission, we can't add the samples to openslide-testdata and can't use them in test cases.

@iewchen
Copy link
Contributor Author

iewchen commented Oct 12, 2022

Yes, but in the last question on the first page of the form, you didn't give us permission to redistribute the samples. Without that permission, we can't add the samples to openslide-testdata and can't use them in test cases.

I re-submitted the same set of slides, this time with permission to redistribute.

@bgilbert
Copy link
Member

Great, thanks. I'll try to get those uploaded within the next few days, and then you'll be able to use them to create test cases.

@bgilbert
Copy link
Member

I've now uploaded the samples here as Zeiss-5-*. Please add test cases when you get a chance.

How are the changes going? Remember that the code will need to be converted to GError error handling before I can do another round of review.

@iewchen
Copy link
Contributor Author

iewchen commented Nov 17, 2022

How are the changes going? Remember that the code will need to be converted to GError error handling before I can do another round of review.

I have added 'GError' error handling.

The newly added test cases failed to pass check because the build machine does not have libjxr. Is there a way to conditionally bypass zeiss driver test?

@bgilbert
Copy link
Member

bgilbert commented Dec 3, 2022

I have added 'GError' error handling.

I'm still seeing a number of places where functions are returning false without setting an error, where g_warning() and similar functions are used for error reporting, and where functions are being called with NULL GError ** arguments.

The newly added test cases failed to pass check because the build machine does not have libjxr. Is there a way to conditionally bypass zeiss driver test?

#407 is the right way to handle this. Thanks for submitting it.

@bgilbert
Copy link
Member

bgilbert commented Dec 3, 2022

The newly added test cases failed to pass check because the build machine does not have libjxr. Is there a way to conditionally bypass zeiss driver test?

Oh, sorry, I didn't read this properly. Yes, for optional libraries, you'll need to set FEATURE_FLAGS similar to this and requires similar to this.

@NicoKiaru
Copy link

NicoKiaru commented Jan 24, 2024

Just FYI, I gathered a list of publicly available CZI files here if you want to test potential issues while reading czi format.

@AdamBajger
Copy link

Is this still in active development, or abandoned?

@iewchen
Copy link
Contributor Author

iewchen commented Feb 1, 2024

It's been a while since I last looked at it. The last pull request was to add JPEG XR to GitHub CI, and it somehow got stuck. Two labs are using this patch, including support for fluorescence, and it works as expected. So if anyone needs to work with Zeiss scanner, they can build it independently.

This driver add support for CZI format generated by Zeiss microscope.

CZI format stores whole slide in many smaller tiles, or subblocks in
CZI's term. The size of these tiles can exceed 2000x2000 pixels. Each
tile has a associated directory entry, which describes its location,
level 0 size, real tile size, the channel it was taken etc. A CZI file
can be pyramid or non-pyramid. Openslide can read non-pyramid CZI,
albeit much slower than the same slide in pyramid format.

A CZI file can embed other files, such as CZI file or JPG. CZI call them
attachments. This driver reads three of them: SlidePreview attachment
as macro image in openslide associated images, Label attachment as
label, and Thumbnail attachment as thumbnail.

CZI stores image tile in JPEG XR or uncompressed. One can save CZI as
uncompressed, which is simply stream of pixel bytes. The size is more
than ten times larger than its JPEG XR encoded counterpart. The
SlidePreview attachment somehow is stored in uncompressed format. CZI
may use JPEG, LZW or ZSTD, however, none of files I saw uses any of
them, therefor these decoders are not included. Images pixel can be:

- BGR24(8bits per RGB color): used by bright field

- BGR48(16 bits per RGB color): SlidePreview is BGR48 uncompressed

- GRAY16: 16 bits gray image, used by fluorescence and TIE

- GRAY8: Zeiss may have an option to generate 8 bits gray image but I
  haven't tested it.

This driver convert BGR48 and GRAY16 into 8 bits per color (or channel)
by keeping the most significant 8 bits. Zeiss may use 12 or 14 bits in
GRAY16, this driver reads the effective pixel bits from XML metadata and
convert pixels accordingly.

At most three Gray channels can be combined into a pseudo ARGB image,
the alpha channel is unused. This driver follows fluorescence microscopy
convention when combine gray channels, i.e. the first gray channel to
blue color, second gray channel to green, third to red.

After detect samples on a slide, Zeiss captures each sample as separated
scene. Because each image tile has a start x and y, openslide can show
these multi-scenes whole slide even without knowing which scene a tile
belongs. Nevertheless, this driver records the scene id when read the
subblock directory entry.

The JPEG XR decoder is from jxrlib. It is included in CentOS, Debian and
Ubuntu. Because CentOS7, Debian 10 and 11, Ubuntu < 22 are all missing
pkg-config file, an autogen.sh script is included to generate libjxr.pc
for configure. jxrlib may be unavailable on some platforms. The
configure script generates Makefile based on presence of jxrlib. It
only builds JPEG XR decoder and zeiss driver when jxrlib is found.

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
With range properties, user can skip empty area on a multi-scenes slide.

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Deepzoom may miss some scenes at max zoom out if scenes on a slide have
different pyramid levels.

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Input of quickhash-1 is the first 544 bytes of CZI file. It is the file
header, which has primary file GUID, file GUID, file part number,
attachment directory position, metadata position, subblock directory
position etc.al. The file header is quite unique.

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
- test openslide_open().

- test openslide_read_region() on both scenes and on highest and lowest
  resolution.

- test quickhash-1.

Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
They're only used on the bottom level, where they're equal to tw/th.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Prep for next commit.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Call it directly from create_czi(), rather than mutating the subblocks
after create_czi() returns them.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Always g_strndup() fixed-length strings from on-disk structures to avoid
potential read overruns when including those strings in error messages.
In particular, do this for att.name even though we currently don't
reference it from a format string.

Exclude magic checks from this rule.  check_magic() already does safe
in-place string comparison and it's called from many places.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Add XML attributes and text nodes from the AttachmentInfos,
DisplaySetting, Information, and Scaling elements of
ImageDocument.Metadata.  The Experiment and HardwareSetting elements are
too verbose, and pertain more to the environment and its configuration
than to the slide itself.  The CustomAttributes element doesn't seem all
that useful, and contains multi-line matrix values that aren't a good fit
for OpenSlide properties.

Omit "ImageDocument.Metadata" from property names, since it would make the
names longer without disambiguating anything.  All XML metadata elements
are under ImageDocument.Metadata.

Detect lists of identically-named elements using an heuristic and one
special case; rename these elements using a unique identifier selected
from their attributes by another heuristic.  If we can't find a unique
identifier, or if we find any identically-named elements after doing the
renames, don't add properties for the offending elements.  We can modify
the heuristics if problems arise, and this avoids properties from multiple
elements clobbering each other, or properties being added under
unsupportable names.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
g_ascii_strto[u]ll is hard to use correctly: callers have to clear errno
beforehand, check it afterward, and also pass and check endptr.  Add
wrappers to handle all that.  While _openslide_parse_double() can return
NAN on parse failure, integer parsers don't have a nice sentinel value, so
return a boolean and write the parsed value through an out-argument.

Currently all callers that want signed also want base 10, but one caller
wants unsigned base 16, and alternate bases make more sense for unsigned
anyway.  Support alternate bases for unsigned but not for signed.

Hamamatsu VMS depended on the ability to ignore trailing garbage, so we
now need some extra code to explicitly reject the garbage.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Read standard properties and metadata values from vendor properties.  This
is documented to be the preferred approach, since it ensures the
underlying raw data is available from properties.

Don't assume that the first objective in the Objectives list is the
relevant one; dereference Information.Image.ObjectiveSettings.ObjectiveRef
instead.

Drop the XML XPath helpers we added.  We don't use them anymore, and
they're probably only useful for parsing values directly out of XML
metadata, which we want to discourage.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
bgilbert added a commit to bgilbert/openslide.github.io that referenced this pull request May 12, 2024
For openslide/openslide#396.

Co-authored-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
@bgilbert
Copy link
Member

Prep commits cherry-picked into #597.

Make it clearer that they're byte arrays, not strings.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Other dimensions might plausibly be missing without loss of functionality,
but subblocks need at least a width and a height.

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Don't do it as a side-effect of create_levels().

Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
@bgilbert
Copy link
Member

I think this is ready to go in. @iewchen, can you give it a final check?

bgilbert added a commit to bgilbert/openslide.github.io that referenced this pull request May 13, 2024
For openslide/openslide#396.

Co-authored-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
@iewchen
Copy link
Contributor Author

iewchen commented May 13, 2024

I think this is ready to go in. @iewchen, can you give it a final check?

Great! I can confirm it works. And squash merge sounds good.

@bgilbert bgilbert merged commit 45dd214 into openslide:main May 13, 2024
16 checks passed
@bgilbert
Copy link
Member

Thanks for the driver! And thanks for your patience through the long review/revision process. Now that the basic structure is in place, additional functionality should hopefully be easier to land. I'd suggest Zstandard next, since JXR has external dependencies which will need some additional work. I'm still interested in SIMD as well, though that may also be a more involved process.

bgilbert added a commit to bgilbert/openslide.github.io that referenced this pull request May 14, 2024
For openslide/openslide#396.

Co-authored-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
bgilbert added a commit to bgilbert/openslide.github.io that referenced this pull request May 14, 2024
For openslide/openslide#396.

Co-authored-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Wei Chen <chenw1@uthscsa.edu>
Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants