implement fletcher32 #412

martindurant · 2022-12-20T14:51:23Z

Fixes #410 cc @rabernat

TODO:

Unit tests and/or doctests in docstrings
Tests pass locally
Docstrings and API docs for any new/modified user-facing classes and functions
Changes documented in docs/release.rst
Docs build locally
GitHub Actions CI passes
Test coverage to 100% (Codecov passes)

martindurant · 2022-12-20T16:52:11Z

The errors showing up are for non-numpy inputs to JSON and msgpack - nothing to do with this PR.

I'll fill in the docs and such shortly.

numcodecs/fletcher32.pyx

numcodecs/tests/test_fletcher32.py

rabernat · 2022-12-20T17:06:40Z

This looks amazing Martin! 🚀 Thanks so much for doing it.

One question: have your verified that this implementation is interoperable with the hdf5 and netcdf4 implementation? Like, if netcdf4 writes a chunk with fletcher32, does this codec successfully decode it?

martindurant · 2022-12-20T17:59:15Z

One question: have your verified that this implementation is interoperable with the hdf5 and netcdf4 implementation? Like, if netcdf4 writes a chunk with fletcher32, does this codec successfully decode it?

No, not yet. The tests just come from examples on wikipedia. Do you think we should bundle a small hdf file, or perhaps just extract a bytes buffer into a test?

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

rabernat · 2022-12-20T18:11:52Z

Do you think we should bundle a small hdf file, or perhaps just extract a bytes buffer into a test?

I would try to get a short snippet of actual bytes. I'd probably use the approach from fsspec/kerchunk#274 of generating a tiny netcdf file using xarray with fletcher32 on, extracting a chunk manually using kerchunk, printing the bytes to the terminal, and then copy-pasting that to this PR.

Here is an example of such a string of bytes.

b'x\xda\xb3\xf1\x8b\xd3gdb\x00\x02\xf1\xa3\x00'

This was generated from the data array([[60, 78, 94, 47]], dtype=int16) with the following encoding.

encoding = {
    'zlib': True,
    'compression': 'zlib',
    'shuffle': True,
    'complevel': 8,
    'fletcher32': True,
    'contiguous': False,
    'chunksizes': (1, 4)
}

Question: in what order is the fletcher checksum applied? Before or after zlib?

martindurant · 2022-12-20T20:19:48Z

Well it was a good idea to check! Their implementation is not the same, so I thought it best to just embed it directly. This ought to be faster too, if that's important.

rabernat

Actually RuntimeError would be more consistent with what other codecs do when decompression fails.

numcodecs/fletcher32.pyx

numcodecs/tests/test_fletcher32.py

numcodecs/_fletcher.c

numcodecs/fletcher32.pyx

martindurant · 2022-12-21T01:45:41Z

OK, @jakirkham , it didn't turn out to be too bad. The algorithm is obviously the same as the original.

I didn't understand why the class should be moved to a different pure-python module, though. lz4, blosc, zstd and vlen all have Codecs in their respective pyx files. (I must say, they are surprisingly complex!)

numcodecs/fletcher32.pyx

jakirkham · 2022-12-21T11:30:11Z

OK, @jakirkham , it didn't turn out to be too bad. The algorithm is obviously the same as the original.

Thanks Martin! 🙏

I didn't understand why the class should be moved to a different pure-python module, though. lz4, blosc, zstd and vlen all have Codecs in their respective pyx files. (I must say, they are surprisingly complex!)

Yeah we have some technical debt to work through for sure.

Since these are now in the same file, agree this matches the existing pattern.

Though at some point we might want to split these apart to simplify things. We need not do that here.

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

rabernat · 2022-12-21T14:06:36Z

Do we have an existing issue to track the test failures we are seeing in this PR? As @martindurant says, they are not related to the new codec. But they will need to be fixed asap.

rabernat

This is a very useful contribution to numcodecs. Thanks a lot Martin! Provided we understand why the tests are failing and have a plan to fix that elsewhere, I'm happy to see this merged.

rabernat · 2023-01-15T15:57:18Z

This has lingered for a while, but I think it looks good. We should get this in.

I have no idea what's going on the the tests. Maybe they are fixed by #417? Martin do you want to try to rebase?

codecov · 2023-01-15T17:27:20Z

Codecov Report

Merging #412 (12eb7a3) into main (4f2a2e3) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #412   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           54        55    +1     
  Lines         2095      2121   +26     
=========================================
+ Hits          2095      2121   +26

Impacted Files	Coverage Δ
numcodecs/__init__.py	`100.00% <100.00%> (ø)`
numcodecs/tests/test_fletcher32.py	`100.00% <100.00%> (ø)`

joshmoore · 2023-05-15T11:37:17Z

@martindurant @rabernat @jakirkham: did a release of this get discussed at any point while I was off galavanting?

martindurant · 2023-05-15T13:58:20Z

No discussion I am aware of

jakirkham · 2023-05-26T05:46:14Z

Raised an issue ( #437 ) to discuss

implement fletcher32

1dc39fb

github-actions bot added the needs release notes label Dec 20, 2022

MSanKeys963 requested a review from jakirkham December 20, 2022 15:20

martindurant mentioned this pull request Dec 20, 2022

Parametrized tests for netcdf encoding options fsspec/kerchunk#274

Open

rabernat reviewed Dec 20, 2022

View reviewed changes

numcodecs/fletcher32.pyx Outdated Show resolved Hide resolved

numcodecs/tests/test_fletcher32.py Show resolved Hide resolved

martindurant and others added 2 commits December 20, 2022 13:00

Update numcodecs/fletcher32.pyx

4a7fd63

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

Add docstring and erorr test

db2275e

Use HDF C impl

4366b5b

Remove unused, add docstrings

8e01f63

rabernat reviewed Dec 20, 2022

View reviewed changes

numcodecs/fletcher32.pyx Outdated Show resolved Hide resolved

numcodecs/tests/test_fletcher32.py Outdated Show resolved Hide resolved

numcodecs/tests/test_fletcher32.py Outdated Show resolved Hide resolved

to runtime and int test

cb0aa2f

jakirkham reviewed Dec 20, 2022

View reviewed changes

numcodecs/_fletcher.c Outdated Show resolved Hide resolved

jakirkham reviewed Dec 20, 2022

View reviewed changes

numcodecs/fletcher32.pyx Show resolved Hide resolved

jakirkham reviewed Dec 20, 2022

View reviewed changes

numcodecs/fletcher32.pyx Outdated Show resolved Hide resolved

jakirkham reviewed Dec 20, 2022

View reviewed changes

numcodecs/fletcher32.pyx Outdated Show resolved Hide resolved

to cython

93cef03

rabernat reviewed Dec 21, 2022

View reviewed changes

numcodecs/fletcher32.pyx Outdated Show resolved Hide resolved

Update numcodecs/fletcher32.pyx

dbbf2bc

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

rabernat mentioned this pull request Dec 21, 2022

checksums for chunks zarr-developers/zarr-python#392

Open

rabernat approved these changes Dec 21, 2022

View reviewed changes

Add docs

4825a1d

github-actions bot removed the needs release notes label Dec 21, 2022

Merge branch 'main' into fletch

12eb7a3

rabernat merged commit 67ede4c into zarr-developers:main Jan 15, 2023

martindurant deleted the fletch branch January 15, 2023 18:14

rabernat mentioned this pull request Jan 25, 2023

Review of the ZEP2 spec - Sharding storage transformer zarr-developers/zarr-specs#152

Closed

mkitti mentioned this pull request Jul 12, 2023

Add Jenkin's lookup3 as a 32-bit checksum for HDF5 #445

Closed

martindurant mentioned this pull request Aug 16, 2023

Support HDF5 compression filter plugins fsspec/kerchunk#351

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement fletcher32 #412

implement fletcher32 #412

martindurant commented Dec 20, 2022 •

edited by rabernat

Loading

martindurant commented Dec 20, 2022

rabernat commented Dec 20, 2022

martindurant commented Dec 20, 2022

rabernat commented Dec 20, 2022 •

edited

Loading

martindurant commented Dec 20, 2022

rabernat left a comment

martindurant commented Dec 21, 2022

jakirkham commented Dec 21, 2022

rabernat commented Dec 21, 2022

rabernat left a comment

rabernat commented Jan 15, 2023

codecov bot commented Jan 15, 2023 •

edited

Loading

joshmoore commented May 15, 2023

martindurant commented May 15, 2023

jakirkham commented May 26, 2023

implement fletcher32 #412

implement fletcher32 #412

Conversation

martindurant commented Dec 20, 2022 • edited by rabernat Loading

martindurant commented Dec 20, 2022

rabernat commented Dec 20, 2022

martindurant commented Dec 20, 2022

rabernat commented Dec 20, 2022 • edited Loading

martindurant commented Dec 20, 2022

rabernat left a comment

Choose a reason for hiding this comment

martindurant commented Dec 21, 2022

jakirkham commented Dec 21, 2022

rabernat commented Dec 21, 2022

rabernat left a comment

Choose a reason for hiding this comment

rabernat commented Jan 15, 2023

codecov bot commented Jan 15, 2023 • edited Loading

Codecov Report

joshmoore commented May 15, 2023

martindurant commented May 15, 2023

jakirkham commented May 26, 2023

martindurant commented Dec 20, 2022 •

edited by rabernat

Loading

rabernat commented Dec 20, 2022 •

edited

Loading

codecov bot commented Jan 15, 2023 •

edited

Loading