Huffman Compression Engine
This project is a complete C++ implementation of the Huffman Coding algorithm. It provides a full pipeline for compressing and decompressing text files by analyzing character frequency and generating an optimal prefix-based binary tree.
The system supports both encryption (compression) and decryption (reconstruction) using a persistent key file, ensuring data can be recovered across different program executions.
Core Features
- Frequency Analysis
- Dynamically calculates the occurrence of each character in the input file to determine optimal encoding weights.
- Huffman Tree Construction
- Builds a binary tree using a priority-based list, where characters with lower frequency are placed deeper, and frequent characters get shorter codes.
- Full Formatting Support
- Preserves spaces, special characters, and newline formatting, ensuring the decrypted output matches the original file exactly
- Persistent Key Storage
- Stores the generated Huffman codes in a key file, allowing reconstruction of the tree without needing the original input.
- Tree Reconstruction
- Rebuilds the Huffman tree from the saved key file using pointer-based traversal logic.
- Memory Management
- Implements recursive deletion of tree nodes to prevent memory leaks.
How It Works
1. Input Processing
The program reads the entire text file, including newline characters.
2. Frequency Calculation
Each character is counted and stored in a custom list structure.
3. Tree Construction
The two least frequent nodes are repeatedly combined to form a parent node until a single root remains.
4. Encoding
Each character is assigned a unique binary code based on its path in the tree:
Left → 0
Right → 1
5. Key Generation
The character-to-code mapping is saved in a file (Key_for_encryption.txt).
6. Encryption
The original text is converted into a string of 0s and 1s and stored in Encrypted_file.txt.
7. Reconstruction
The tree is rebuilt from the key file using the stored codes.
8. Decryption
The encoded bitstream is traversed through the reconstructed tree to recover the original text, saved in Decrypted_file.txt.
Technical Implementation
Core Data Structures
- Custom tree node (encryption)
- Inherited STL list as a priority structure
- Pointer-based binary tree
Key Techniques
- Recursion (tree traversal, deletion, encoding)
- File handling using ifstream and ofstream
- Operator overloading (for frequency increment)
- Custom sorting for priority handling
Project Workflow
- Read and encode original file
- Build Huffman tree
- Save key file
- Rebuild tree from key
- Save encrypted file
- Decrypt file using rebuilt tree
Current Limitations
- Encoded data is stored as strings ("0" and "1") instead of actual bits
- This does not reduce physical file size significantly
- Uses list instead of a more efficient priority_queue
Future Improvements
- Bit Packing Convert binary strings into actual bits for real compression and reduced file size.
- Performance Optimization Replace list-based priority handling with a priority_queue (min-heap).
- User Interface Add a command-line menu for file selection and operation control.
- Error Handling
  Improve robustness for missing files and invalid inputs.