-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for BitRound quantization by number-of-significant-bits? #2225
Comments
Is there a universally used conversion from number-of-digits to number-of-bits? |
Not in this case, i.e., not with optimized algorithms. The conversion factor of |
so use of NSB attribute vs NSD attribute will be algorithm specific? |
Yes, BitGroom and GranularBR use NSD. BitRound uses NSB. |
If there is no way that an algorithm will use both nsb and nsd, then |
Yeah, that makes sense. It might be more extensible in the long run, though it requires two attributes not one if I understand you correctly, e.g.,
|
Whatever changes are going to be made, let's make them now, because @WardF is preparing a release... |
I'll be creating the |
I think it's totally fine to have an nsd parameter, and specify in the
documentation that means binary digits for some quantize modes, and decimal
digits for other quantize modes. Bits are digits too! In fact, "bit" is
just short for "binary digit". So we don't need a different parameter.
I am starting to agree that perhaps we need two attributes, as discussed,
one with the name of the algorithm, one with the number of digits.
…On Wed, Feb 16, 2022 at 2:55 PM Ward Fisher ***@***.***> wrote:
I'll be creating the v4.9.0-wellspring branch soon and will start sorting
PR's into that branch for the 4.9.0 release. So there is, realistically, a
little bit of time before the release. I certainly won't quietly mint a
4.9.0 release w/out these features :).
—
Reply to this email directly, view it on GitHub
<#2225 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCSXXBUHS4WIUGYMR5E4JLU3QMLFANCNFSM5OM732BA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hey @DennisHeimbigner |
Thanks. Yes there appears to be some missing code in zvar.c. You can go ahead and fix it. |
FYI, PR #2232 to add the BitRound quantization is ready for review. If this is merged then netCDF will have two state-of-the-art quantization codecs (BitRound and Granular BitRound) and one last generation codec (BitGroom). This PR adds BitRound, it does not remove BitGroom. The decision to keep or remove BitGroom is not mine to make since others have already, graciously, put effort into supporting BitGroom. I prepared some slides to explain the quantization codecs that have been proposed and their key differences, including pros/cons. You can preview the presentation here. Slide 4-9 are the keys ones related to the BitGroom question. There's more to discuss than what's on the slides, and perhaps we can do that at our rescheduled meeting. |
I do not think BitGroom should be removed. It should be faster than the other two methods, and that is a significant feature. Many users might prefer a faster, but less compressive solution. Great work getting this in @czender and @WardF. On behalf of NOAA and the UFS: thanks. This will be put to immediate use once the next version of netcdf-c is released, and will make it into the operational code the next big UFS version cycle, where it will be helping compress data that are used by many members of the netCDF community! |
netCDF-C needs (IMHO) one more lossy codec: BitRound. BitRound takes the Number of Significant Bits (NSB) not digits (NSD) as input, and performs quantization with IEEE rounding on the remaining bits, aka the "keep bits". BitRound is already used as the final step in GranularBR, thus the heart of BitRound algorithm is already in netCDF-dev. However, I think we need to make BitRound separately invokable so users can directly specify NSB (not NSD). BitRound will use the same NSB for all values of a given variable, unlike GBR in which each value of a given variable can have a different (internally decided) NSB. I added a separately invocable BitRound to NCO and to CCR last month, in case anyone wants to try it.
The difference in compression ratio (CR) between losslessly compressing (e.g., with Zstd) a uniformly quantized/rounded variable (e.g., with BitRound or BitGroom) and losslessly compressing a variable quantized/rounded value-by-value (e.g., with Granular BitRound) can be significant (~5% though needs testings). This is because lossless compressors can more easily recognize/compress the set of trailing zero-bits in mantissa when the number of those zero bits does not change. Setting NSB may appeal more to computer science types than to domain researchers who might be more comfortable with setting NSD than NSB. BitRound is also used as the final (and easiest) step in the method of Klower et al. (2021) https://doi.org/10.1038/s43588-021-00156-2
This issue is a place to discuss any related feedback. I have started to draft a PR and would appreciate if @DennisHeimbigner might opine on the general idea, and also one specific question: if BitRound were to go into netCDF-C, should the PR re-use the structure member
var->nsd
to also hold the NSB, or should the PR add a new membervar->nsb
, or should it rename the structure member, or what?FWIW, I would recommend everyone use either GranularBR or BR depending on their tastes. BitGroom just doesn't cut the mustard against these two, though it has been a useful starting point with a nice peer-reviewed journal reference. I'm also willing to craft this PR to replace BG with BR thereby reducing the # of quantization codecs from 3 to 2. Feedback welcome!
The text was updated successfully, but these errors were encountered: