Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
mcroomp committed Nov 22, 2023
1 parent 98601d8 commit 1055458
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 2 deletions.
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,19 @@ sure that the DEFLATE content is recreated exactly as it was written. This is no
DEFLATE has a large degree of freedom in choosing both how the distance/length pairs are chose
and how the Huffman trees are created.

The general approach is as follows:
1. Decompress stream into plaintext and a list of blocks containing tokens that are either literals (bytes) or distance, length pairs.
2. Estimation scan of content to estimate parameters used for compression. The better the estimation, the less corrections we need when we try to recreate the compression.
3. Rerun compression using the zlib algorithm using the parameters gathered above. A difference encoder is used to record each instance where the token predicted by our implementation of DEFLATE differs from what we found in the file.

The following differences are corrected:
- Type of block (uncompressed, static huffman, dynamic huffman)
- Number of tokens in block (normally 16386)
- Dynamic huffman encoding (estimated using the zlib algorithm, but there are multiple ways to construct more or less optimal length limited Huffman codes)
- Literal vs (distance, length) pair (corrected by a single bit)
- Length or distance is incorrect (corrected by encoding the number of hops backwards until the correct one)
- Weird 258 length size (standard allows for two different encodings)

Note that the data formats of the recompression information are different and incompatible to the original preflate implemenation, as this library uses a different arithmetic encoder (shared from the Lepton JPEG compression library).

## How to Use This Library
Expand Down
4 changes: 2 additions & 2 deletions src/huffman_helper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ fn is_valid_huffman_code_lengths(code_lengths: &[u8]) -> bool {
}

// essential property of huffman codes is that all internal nodes
// have exactly two children. This means that the amount of free
// space doubles each time we go down the node tree.
// have exactly two children. This means that the number of internal
// nodes doubles each time we go down one level in the tree.
let mut internal_nodes = 2;
for i in 1..length_count.len() {
internal_nodes -= length_count[i];
Expand Down

0 comments on commit 1055458

Please sign in to comment.