Data Compression

Introduction

There are several ways to perform lossless data compression. In this project we will use Huffman coding. (See the links below.)

Here we can test the data compression of text and binary data.

Project #1

Create a table of character frequency counts from a test file. Count Punctuation? Characters? Spaces? Non-English Characters? Bytes? ...?

Using this character frequency table can you reconstruct the original data from the compressed data?

Instead of counting characters, count bytes?

How does your character frequency table compare with the one in the "Letter frequency" link below.

Project #2

Create two functions. One that does Huffman encoding and one that does Huffman decoding.

Use the frequencies table you created in Project #0.

How does your frequency table compare to the one in the "Letter frequency" link below?

Project #3

Create an interactive program that

Project #4

Instead of text file use an image file (binary/byte data).

Possible Text Files For Testing

Declaration of Independence
United States Constitution
Other...

Links

Data compression (Wikipedia)

Letter frequency (Wikipedia)

Huffman coding (Wikipedia)

Canonical Huffman code (Wikipedia)

Run-length encoding (Wikipedia)

Asymmetric numeral systems (Wikipedia)

These compression algorithms could halve our image file sizes (but we don't use them) (YouTube)

Explaining File Compression Formats (YouTube)

Huffman

dahuffman

FYI

For more information about Huffman coding click HERE

For a hint about counting characters click HERE