曾彪彪的个人网站
首页
文章列表
>>
文章详情
Huffman Coding
作者:
曾彪彪
日期:
2025-05-15 08:13:02
阅读(136)
分类:
Algorithm
# Huffman Coding Huffman coding is a widely used algorithm for lossless data compression. It works by assigning variable-length binary codes to input characters, with shorter codes assigned to more frequent characters. This reduces the overall size of the encoded data. The Huffman coding process is divided into two main stages: ## Encoding: Generate Huffman Codes and Serialize the Tree 1. Compute Character Frequencies Analyze the input data to determine the frequency of each character (or byte). These frequencies are used to build the Huffman tree. 2. Build the Huffman Tree Create a binary tree where each leaf node represents a character and its frequency. Use a priority queue (min-heap) to iteratively combine the two least frequent nodes into a new node until a single tree remains. 3. Generate the Huffman Code Table Traverse the Huffman tree to assign binary codes to each character: Traverse left: append 0 Traverse right: append 1 Each character is now associated with a unique prefix-free binary code. 4. Serialize the Huffman Tree Convert the Huffman tree into a compact format that can be saved or transmitted alongside the encoded data. Common methods include: Preorder traversal with markers for leaf/internal nodes. Storing character values only at leaf nodes. 5. Encode the Input Data Replace each character in the input with its corresponding Huffman code from the table. This results in a compressed bitstream. 6. Store or Transmit the Encoded Data and Serialized Tree To allow proper decoding, both the encoded bitstream and the serialized Huffman tree must be stored or transmitted. ## Decoding: Reconstruct the Tree and Decode the Data 1. Deserialize the Huffman Tree Rebuild the Huffman tree from the serialized form. This is necessary to interpret the encoded bitstream correctly. 2. Decode the Bitstream Read the compressed data bit by bit and traverse the Huffman tree accordingly: 0 → move to left child 1 → move to right child When a leaf node is reached, the corresponding character is decoded. Repeat this process until the entire bitstream has been decoded into the original data. ---- ## Summary Step - Encoding Build tree → generate codes → serialize tree → encode data - Decoding Deserialize tree → decode bitstream using the tree --- ## Demo Code ```c++ #include <bits/stdc++.h> using namespace std; struct Node { char ch; int freq; shared_ptr<Node> left; shared_ptr<Node> right; Node(char ch, int freq, shared_ptr<Node> left, shared_ptr<Node> right) : ch(ch), freq(freq), left(left), right(right) {} }; void readInputAndAnalysisFreq(string &input, unordered_map<char, int> &freqMap) { cout << "input a text: "; getline(cin, input); for (const auto c : input) { freqMap[c]++; } } shared_ptr<Node> generateHuffmanTree(const unordered_map<char, int> &freqMap) { auto cmp = [](const shared_ptr<Node> &a, const shared_ptr<Node> &b) { return a->freq > b->freq; }; priority_queue<shared_ptr<Node>, vector<shared_ptr<Node>>, decltype(cmp)> pq(cmp); for (const auto [c, f] : freqMap) { pq.push(make_shared<Node>(c, f, nullptr, nullptr)); } while (pq.size() > 1) { auto n1 = pq.top(); pq.pop(); auto n2 = pq.top(); pq.pop(); pq.push(make_shared<Node>('\0', n1->freq + n2->freq, n1, n2)); } return pq.top(); } void generateHuffmanCode(const shared_ptr<Node> &root, const string &code, unordered_map<char, string> &huffmanCode) { if (!root) { return; } if (!root->left && !root->right) { huffmanCode[root->ch] = code; return; } generateHuffmanCode(root->left, code + "0", huffmanCode); generateHuffmanCode(root->right, code + "1", huffmanCode); } string generateHuffmanBuffer(const string &input, unordered_map<char, string> &huffmanCode) { ostringstream oss; for (auto c : input) { oss << huffmanCode[c]; } return oss.str(); } void generateHuffmanTreeBuffer(const shared_ptr<Node> &root, string &huffmanTreeBuffer) { if (!root) { return; } if (!root->left && !root->right) { huffmanTreeBuffer.push_back('1'); huffmanTreeBuffer.push_back(root->ch); return; } huffmanTreeBuffer.push_back('0'); generateHuffmanTreeBuffer(root->left, huffmanTreeBuffer); generateHuffmanTreeBuffer(root->right, huffmanTreeBuffer); } shared_ptr<Node> buildHuffmanTree(const string &huffmanTreeBuffer, int &index) { if (huffmanTreeBuffer.empty() || index >= huffmanTreeBuffer.size()) { return nullptr; } char c = huffmanTreeBuffer[index++]; if (c == '1') { if (index >= huffmanTreeBuffer.size()) { return nullptr; } char v = huffmanTreeBuffer[index++]; return make_shared<Node>(v, 0, nullptr, nullptr); } shared_ptr<Node> parent = make_shared<Node>('\0', 0, nullptr, nullptr); parent->left = buildHuffmanTree(huffmanTreeBuffer, index); parent->right = buildHuffmanTree(huffmanTreeBuffer, index); return parent; } string decodeText(shared_ptr<Node> root, const string &huffmanCode) { string result = ""; if (!root || huffmanCode.empty()) { return result; } auto current = root; for (auto c : huffmanCode) { if (!current) { continue; } if (c == '0') { current = current->left; } else if (c == '1') { current = current->right; } if (!current->left && !current->right) { result.push_back(current->ch); current = root; continue; } } return result; } int main() { string input; unordered_map<char, int> freqMap; readInputAndAnalysisFreq(input, freqMap); shared_ptr<Node> huffmanTreeRoot = generateHuffmanTree(freqMap); unordered_map<char, string> huffmanCode; generateHuffmanCode(huffmanTreeRoot, "", huffmanCode); for (const auto [c, s] : huffmanCode) { cout << c << ": " << s << endl; } string huffmanBuffer = generateHuffmanBuffer(input, huffmanCode); cout << huffmanBuffer << endl; string huffmanTreeBuffer; generateHuffmanTreeBuffer(huffmanTreeRoot, huffmanTreeBuffer); cout << huffmanTreeBuffer << endl; int index = 0; auto root = buildHuffmanTree(huffmanTreeBuffer, index); string decodedText = decodeText(root, huffmanBuffer); cout << decodedText << endl; return 0; } ``` Here is the output: ``` input a text: Huffman code is a widely used algorithm for lossless data compression. n: 111111 f: 11110 o: 1110 w: 111110 : 110 a: 1011 t: 10011 e: 1010 H: 100101 r: 0000 m: 0001 y: 001000 l: 0100 g: 001001 c: 00101 p: 010100 d: 0011 u: 01011 s: 011 i: 1000 h: 010101 .: 100100 1001010101111110111100001101111111111000101111000111010110100001111010111101111101000001110100100001000110010110111010001111010110100001001111000001000100110101010001110111101110000011001001110011011010010100110111100011101110011101111000101111000010101000000101001101110001110111111100100 00001r1m0001y1g1c1d001l001p1h1u1s0001i001.1H1t01e1a01 01o01f01w1n Huffman code is a widely used algorithm for lossless data compression. ```
评论(0)
评论(必填)
名称(必填)
联系方式(可选)
验证码(必填)
提交
评论(必填)
名称(必填)
联系方式(可选)
验证码(必填)