Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smaller encoding of the huffman tree #27

Open
javax-swing opened this issue May 16, 2024 · 2 comments
Open

Smaller encoding of the huffman tree #27

javax-swing opened this issue May 16, 2024 · 2 comments

Comments

@javax-swing
Copy link

javax-swing commented May 16, 2024

Hey Prime! Your Huffman encoding video got me inspired.

From what I understand the tree is currently encoded via a table. Each node being 6 bytes:Value, LeftIndex, RightIndex you could reduce this quite dramatically by always building in the traditional left -> right depth first manner

Encoding

Each node is encoded with a leading bit which indicates if it's a leaf or a branch:

Leaf Bit 0
Branch Bit 1

Then each leaf has whatever data it needs encoded afterwards (in your case 16 bits)
Each branch will then encode it's left and right branch.

Decoding

Read the next bit
If it's 0 read the next 16 bits and construct the leaf
If it's 1 (recurse to build left, recurse to build right)

Some maths

For a huffman tree of N nodes this representation would need ceil((N + N*ValueBitLength) / 8)bytes to represent the tree in this format.

So the example from your unit tests would drop from 42 bytes down to 15.

Tree

   ()
  /  \
 A   ()
     / \
    B  ()
       / \
      C   D

Is encoded as 10A10B10C0D (mixing binary and character values)

You could even do variable length encoding with this approach by having a header which dictates how many bytes each node has

Example of it working

Example in scala (because I haven't learned go yet) https://scastie.scala-lang.org/B3MontqASvuSNT3wNN4qxQ assuming a single byte for the values.

Hope this is helpful or at least interesting.

Peace out!

@ThePrimeagen
Copy link
Owner

Okay this is super cool and I'm going to keep this in here

@javax-swing
Copy link
Author

Thanks man.

There is an alternative encoding if for some reason the tree can have missing branches. you just add a bit to indicate if there is a left and a right per branch. I don't think that's relevant for Huffman though

https://scastie.scala-lang.org/rOPTX4k4R4elzTIHrgb49Q

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants