-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
huffman code error for single key dict #172
Comments
This is my fix, where also I find it useful to gracefully deal with the empty case:
|
Thank you for using bitarray and discovering this special edge case! I'm not sure if this is actually a bug. Obviously the huffman code On the other hand, one could argue that one symbol can be encoded as a single bit (like you did), but would that still be "Huffman code"? Or is this a special case outside the scope of the function Note that in your fix |
…many of the same single character), see also #172
@ilanschnell thanks for your detailed response and commit! (and bitarray of course) as regard to the edge cases: i am not aware of the practice of encoding the length with huffman. anyway this is not a requirement of huffman, and moreover streaming decoders do exist, so sometimes the length is not defined. specifically i implemented one of the stream decoders from here: https://www.researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes although encoding the length for a single symbol makes sense from a compression perspective, this takes us to the domain of modified Huffman which makes use of run-length encoding, which sounds like what you described. also as you've mentioned, you still need to save the cleartext symbol in the table. anyway, i was stress testing some edge cases where huffman encoding/decoding is part of the process and i did want the flow to go through also for these degenerate cases. so this was more of a pragmatic remark rather than a purist or philosophical one. while failing on empty strings seems like a reasonable design decision, i would at least except the single symbol case to work. as regard to |
Thanks for your detailed response. I agree that the having an empty bitarray as a result is surprising, and most likely not what one would expect or want. I'm thinking about returning In regards to
I have though about the cases BTW: I've been working on canonical Huffman coding for the last few days, see #173. This is be part of the upcoming bitarray 2.5.0 release. |
This is an edge case, where huffman_code() fails if the dict has only a single key:
huffman_code(Counter('xxx'))
This is not useful and will of course cause encode() to fail:
bitarray().encode(huffman_code(Counter('xxx')), 'xxx')
The text was updated successfully, but these errors were encountered: