ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

基于哈夫曼(haffuman)算法的文件压缩的实现(C语言)(转)

2020-03-21 11:55:53  阅读:445  来源: 互联网

标签:haffuman fcs dictionary 哈夫曼 int htn C语言 char FCS


本文首先简要阐述哈夫曼算法的基本思想,然后介绍了使用哈夫曼算法进行文件压缩和解压缩的

处理步骤,最后给出了C语言实现的文件压缩和解压缩的源代码。

                     哈夫曼算法的主要思想是:

                     ①首先遍历要处理的字符串,得到每个字符的出现的次数;

                     ②将每个字符(以其出现次数为权值)分别构造为二叉树(注意此时的二叉树只有一个节点);

                     ③取所有二叉树种种字符出现次数最小的二叉树合并为一颗新的二叉树,新二叉树根节点

                        的权值等于两个子节点的权值之和,新节点中的字符忽略;

                     ④重复过程③直到所有树被合并为同一棵二叉树

                     ⑤遍历最后得到的二叉树,自顶向下按路径编号,指向左节点的边编号0,指向右节点的边编号1,

                         从根到叶节点的所有边上的0和1链接起来,就是叶子节点中字符的哈夫曼编码。

下图展示了哈夫曼编码的基本思想。


                                           

                                     基于哈夫曼算法的文件压缩和解压缩过程分别说明如下:

一、文件压缩:

①统计词频:读取文件的每个字节,使用整数数组int statistic[MAX_CHARS]统计每个字符出现的次数,

    由于一个字节最多表示2^8-1个字符,所以MAX_CHARS=256就足够了。在统计字符数的时候,

    对于每一个byte, 有statistic[(unsigned char)byte]++。

②构造哈夫曼树:根据statistic数组,基于哈夫曼树算法造哈夫曼树,由于构造的过程中每次都要取最小权值的字符,

   所以需要用优先队列来维护每棵树的根节点。

③生成编码:深度优先遍历哈弗曼树,得到每个叶子节点中的字符的编码并存入字符串数组char *dictionary[MAX_CHARS];

④存储词频:新建存储压缩数据的文件,首先写入不同字符的个数,然后将每个字符及其对应的词频写入文件。

⑤存储压缩数据:再次读取待压缩文件的每个字节byte,由dictionary[(unsigned int)byte]得到对应的编码(注意每个字符

   编码的长度不一),使用位运算一次将编码中的每个位(BIT)设置到一个char类型的位缓冲中,可能多个编码才能填满一个

  位缓冲,每填满一次,将位缓冲区以单个字节的形式写入文件。当文件遍历完成的时候,文件的压缩也就完成了。

二、文件解压:

①读取词频:读取压缩文件,将每个字符的出现次数存入数组statistic

②构造哈夫曼编码树:根据statistic数组构造哈夫曼编码树

③继续读取压缩文件,对于每个字节,使用位运算得到每个位(BIT)。对于每个BIT,根据BIT从根开始遍历哈夫曼树,如果BIT是0

   就走左分支,如果BIT是1就走有分支,走到叶子节点的时候,输出对应的字符。走到叶子节点后,重新从哈夫曼树根节点开始匹配

   每个位。当整个压缩文件读取完毕时,文件解压缩也完成了。

            上文介绍了基于哈夫曼算法的文件压缩和解压缩,下面给出基于上述思想的C语言源代码,一共有5个文件,其中pq.h和pq.c

是优先队列,compress.h和compress.c是压缩和解压缩的实现,main.c是测试文件。

pq.h和pq.c请参见《优先队列(priority_queue)的C语言实现》:

http://www.cnblogs.com/this-543273659/archive/2011/07/31/2122639.html

另外三个文件内容如下:

/*
 * File: compress.h
 * Purpose: To compress file using the Haffman algorithm
 * Author: puresky
 * Date: 2011/05/01
 */

#ifndef _FILE_COMPRESSION_H
#define _FILE_COMPRESSION_H


//Haffuman Tree Node
typedef struct HaffumanTreeNode HTN;
struct HaffumanTreeNode
{
      char _ch;   //character
      int _count; //frequency
      struct HaffumanTreeNode *_left; //left child
      struct HaffumanTreeNode *_right;//rigth child
};

//FileCompress Struct
#define BITS_PER_CHAR 8     //the number of bits in a char
#define MAX_CHARS 256            //the max number of chars
#define FILE_BUF_SIZE 8192  //the size of Buffer for FILE I/O
typedef struct FileCompressStruct FCS;
struct FileCompressStruct
{
      HTN *_haffuman;        //A pointer to the root of hafumman tree
      unsigned int _charsCount; //To store the number of chars
      unsigned int _total; //Total bytes in a file.
      char *_dictionary[MAX_CHARS]; //to store the encoding of each character
      int _statistic[MAX_CHARS]; //To store the number of each character
};

FCS *fcs_new();
void fcs_compress(FCS *fcs, const char *inFileName, const char *outFileName);
void fcs_decompress(FCS *fcs, const char *inFileName, const char *outFileName);
void fcs_free(FCS *fcs);
#endif

/*
 * File: compress.c
 * Purpose: To compress file using the Haffman algorithm
 * Author: puresky
 * Date: 2011/05/01
 */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "compress.h"
#include "pq.h"

static const unsigned char mask[8] =
{
      0x80, /* 10000000 */
      0x40, /* 01000000 */
      0x20, /* 00100000 */
      0x10, /* 00010000 */
      0x08, /* 00001000 */
      0x04, /* 00000100 */
      0x02, /* 00000010 */
      0x01  /* 00000001 */                       
};


//static functions of HTN
static HTN *htn_new(char ch, int count)
{
      HTN *htn = (HTN *)malloc(sizeof(HTN));
      htn->_left = NULL;
      htn->_right = NULL;
      htn->_ch = ch;
      htn->_count = count;
      return htn;
}

static void htn_print_recursive(HTN *htn, int depth)
{
      int i;
      if(htn)
      {
            for(i = 0; i < depth; ++i)
                  printf("  ");
            printf("%d:%d\n", htn->_ch, htn->_count);
            htn_print_recursive(htn->_left, depth + 1);
            htn_print_recursive(htn->_right, depth + 1);
      }

}

static void htn_print(HTN *htn)
{
      htn_print_recursive(htn, 0);
}

static void htn_free(HTN *htn)
{
      if(htn)
      {
            htn_free(htn->_left);
            htn_free(htn->_right);
            free(htn);
      }
}

//static functions of FCS

static void fcs_generate_statistic(FCS *fcs, const char *inFileName)
{
      int ret, i;
      unsigned char buf[FILE_BUF_SIZE];
      FILE *pf = fopen(inFileName, "rb");
      if(!pf)
      {
            fprintf(stderr, "can't open file:%s\n", inFileName);
            return;
      }

      while((ret = fread(buf, 1, FILE_BUF_SIZE, pf)) > 0)
      {
            fcs->_total += ret;
            for(i = 0; i < ret; ++i)
            {
                  if(fcs->_statistic[buf[i]] == 0)
                        fcs->_charsCount++;
                  fcs->_statistic[buf[i]]++;
            }
      }
      fclose(pf);
}

static void fcs_create_haffuman_tree(FCS *fcs)
{
      int i, count;
      HTN *htn, *parent, *left, *right;
      KeyValue *kv, *kv1, *kv2;
      PriorityQueue *pq;
      pq = priority_queue_new(PRIORITY_MIN);

      for(i = 0; i < MAX_CHARS; ++i)
      {
            if(fcs->_statistic[i])
            {
                  htn = htn_new((char)i, fcs->_statistic[i]);
                  kv = key_value_new(fcs->_statistic[i], htn);
                  priority_queue_enqueue(pq, kv);
            }
      }
      //fprintf(stdout, "the number of haffuman leaf is %d\n", priority_queue_size(pq));
     
      while(!priority_queue_empty(pq))
      {
            //fprintf(stdout, "priority queue size:%d\n", priority_queue_size(pq));
            kv1 = priority_queue_dequeue(pq);
            kv2 = priority_queue_dequeue(pq);
            if(kv2 == NULL)
            {
                  fcs->_haffuman = kv1->_value;
                  key_value_free(kv1, NULL);
            }
            else
            {
                  left = (HTN *)kv1->_value;
                  right = (HTN *)kv2->_value;
                  count = left->_count + right->_count;
                  key_value_free(kv1, NULL);
                  key_value_free(kv2, NULL);
                  parent = htn_new(0, count);
                  parent->_left = left;
                  parent->_right = right;
                  kv = key_value_new(count, parent);
                  priority_queue_enqueue(pq, kv);
            }
      }

      priority_queue_free(pq, NULL);

      //htn_print(fcs->_haffuman);
}

static void fcs_generate_dictionary_recursively(HTN *htn, char *dictionary[], char path[], int depth)
{
      char *code = NULL;
      if(htn)
      {
            if(htn->_left == NULL && htn->_right == NULL)
            {
                  code = (char *)malloc(sizeof(char) * (depth + 1));
                  memset(code, 0, sizeof(char) * (depth + 1));
                  memcpy(code, path, depth);
                  dictionary[(unsigned char)htn->_ch] = code;
            }

            if(htn->_left)
            {
                  path[depth] = '0';
                  fcs_generate_dictionary_recursively(htn->_left, dictionary, path, depth + 1);
            }

            if(htn->_right)
            {
                  path[depth] = '1';
                  fcs_generate_dictionary_recursively(htn->_right, dictionary, path, depth + 1);
            }
      }
}

static void fcs_generate_dictionary(FCS *fcs)
{
      char path[32];
      fcs_generate_dictionary_recursively(fcs->_haffuman, fcs->_dictionary, path, 0);
      //fcs_print_dictionary(fcs);
}

static void fcs_print_dictionary(FCS *fcs)
{
      int i;
      for(i = 0; i < MAX_CHARS; ++i)
            if(fcs->_dictionary[i] != NULL)
                  fprintf(stdout, "%d:%s\n", i, fcs->_dictionary[i]);
}

static void fcs_write_statistic(FCS *fcs, FILE *pf)
{
      int i;
      fprintf(pf, "%d\n", fcs->_charsCount);
      for(i = 0; i < MAX_CHARS; ++i)
            if(fcs->_statistic[i] != 0)
                  fprintf(pf, "%d %d\n", i, fcs->_statistic[i]);
}

static void fcs_do_compress(FCS *fcs, const char *inFileName, const char* outFileName)
{
      int i, j, ret;
     
      char *dictEntry, len;
      unsigned int bytes;
      char bitBuf;
      int bitPos;
     
      unsigned char inBuf[FILE_BUF_SIZE];
      FILE *pfIn, *pfOut;

      pfIn = fopen(inFileName, "rb");
      if(!pfIn)
      {
            fprintf(stderr, "can't open file:%s\n", inFileName);
            return;
      }
      pfOut = fopen(outFileName, "wb");
      if(!pfOut)
      {
            fclose(pfIn);
            fprintf(stderr, "can't open file:%s\n", outFileName);
            return;
      }

      fcs_write_statistic(fcs, pfOut);

      bitBuf = 0x00;
      bitPos = 0;
      bytes = 0;
      while((ret = fread(inBuf, 1, FILE_BUF_SIZE, pfIn)) > 0)
      {
            for(i = 0; i < ret; ++i)
            {
                  len = strlen(fcs->_dictionary[inBuf[i]]);
                  dictEntry = fcs->_dictionary[inBuf[i]];
                  //printf("%s\n", dictEntry);
                  for(j = 0; j < len; ++j)
                  {
                        if(dictEntry[j] == '1')
                        {
                              bitBuf |= mask[bitPos++];
                        }
                        else
                        {
                              bitPos++;
                        }
                       
                        if(bitPos == BITS_PER_CHAR)
                        {
                              fwrite(&bitBuf, 1, sizeof(bitBuf), pfOut);
                              bitBuf = 0x00;
                              bitPos = 0;

                              bytes++;
                        }
                  }
            }
      }
      if(bitPos != 0)
      {
            fwrite(&bitBuf, 1, sizeof(bitBuf), pfOut);
            bytes++;
      }

      fclose(pfIn);
      fclose(pfOut);
      printf("The compression ratio is:%f%%\n",
            (fcs->_total - bytes) * 100.0 / fcs->_total);
}


static void fcs_read_statistic(FCS *fcs, FILE *pf)
{
      int i, charsCount = 0;
      int ch;
      int num;

      fscanf(pf, "%d\n", &charsCount);
      fcs->_charsCount = charsCount;

      for(i = 0; i < charsCount; ++i)
      {
            fscanf(pf, "%d %d\n", &ch, &num);
            fcs->_statistic[(unsigned int)ch] = num;
            fcs->_total += num;
      }
}

static void fcs_do_decompress(FCS *fcs, FILE *pfIn, const char *outFileName)
{
      int i, j, ret;
      unsigned char ch;
      HTN *htn;
      unsigned char buf[FILE_BUF_SIZE];
      unsigned char bitCode;
      int bitPos;
      FILE *pfOut;

      pfOut = fopen(outFileName, "wb");
      if(!pfOut)
      {
            fprintf(stderr, "can't open file:%s\n", outFileName);
            return;
      }
      htn = fcs->_haffuman;
      bitCode = 0x00;
      bitPos = 0;
      while((ret = fread(buf, 1, FILE_BUF_SIZE, pfIn)) > 0)
      {
            for(i = 0; i < ret; ++i)
            {
                  ch = buf[i];

                  for(j = 0; j < BITS_PER_CHAR; ++j)
                  {
                        if(ch & mask[j])
                        {
                              htn = htn->_right;     
                        }
                        else
                        {
                              htn = htn->_left;
                        }
                        if(htn->_left == NULL && htn->_right == NULL) //leaf
                        {
                              if(fcs->_total > 0)
                              {
                                    fwrite(&htn->_ch, 1, sizeof(char), pfOut);
                                    fcs->_total--;
                              }
                              htn = fcs->_haffuman;
                        }
                  }
            }
      }
      fclose(pfOut);
}


//FCS functions
FCS *fcs_new()
{
      FCS *fcs = (FCS *)malloc(sizeof(FCS));
      fcs->_charsCount = 0;
      fcs->_total = 0;
      memset(fcs->_statistic, 0, sizeof(fcs->_statistic));
      memset(fcs->_dictionary, 0, sizeof(fcs->_dictionary));
      fcs->_haffuman = NULL;
      return fcs;
}

void fcs_free(FCS *fcs)
{
      int i;
      if(fcs)
      {
            if(fcs->_haffuman)
                  htn_free(fcs->_haffuman);
            for(i = 0; i < MAX_CHARS; ++i)
                  free(fcs->_dictionary[i]);
            free(fcs);
      }
}

void fcs_compress(FCS *fcs, const char *inFileName, const char *outFileName)
{
      fprintf(stdout, "To compress file: %s ...\n", inFileName);
      fcs_generate_statistic(fcs, inFileName);
      fcs_create_haffuman_tree(fcs);
      fcs_generate_dictionary(fcs);
      fcs_do_compress(fcs, inFileName, outFileName);
      fprintf(stdout, "The compressed data of file: %s stored at %s!\n",
            inFileName, outFileName);
}

void fcs_decompress(FCS *fcs, const char *inFileName, const char *outFileName)
{

      FILE *pfIn;
      fprintf(stdout, "To decompress file: %s ...\n", inFileName);
      pfIn= fopen(inFileName, "rb");
      if(!pfIn)
      {
            fprintf(stderr, "can't open file: %s\n", inFileName);
            return ;
      }
      fcs_read_statistic(fcs, pfIn);
      fcs_create_haffuman_tree(fcs);
      fcs_generate_dictionary(fcs);
      fcs_do_decompress(fcs, pfIn, outFileName);
      fclose(pfIn);
      fprintf(stdout, "The decompressed data of file: %s stored at %s\n",
            inFileName, outFileName);
}

/*
 * File: main.c
 * Purpose: testing File Compression
 * Author:puresky
 * Date: 2011/05/01
 */
#include <stdlib.h>
#include "compress.h"

const int DO_COMPRESS = 1;
const int DO_DECOMPRESS = 1;

const char *InFile = "data.txt"; //The file to compress.
const char *CompressedFile = "data.hfm"; //Compressed data of the file.
const char *OutFile = "data2.txt"; //The decompressed file of the data.

int main(int argc, char **argv)
{
      //1. compress file
      if(DO_COMPRESS)
      {
            FCS *fcs1;
            fcs1 = fcs_new();
            fcs_compress(fcs1, InFile, CompressedFile);
            fcs_free(fcs1);
      }
      //2. decompress file
      if(DO_DECOMPRESS)
      {
            FCS *fcs2;
            fcs2 = fcs_new();
            fcs_decompress(fcs2, CompressedFile, OutFile);
            fcs_free(fcs2);
      }
      system("pause");
      return 0;
}

标签:haffuman,fcs,dictionary,哈夫曼,int,htn,C语言,char,FCS
来源: https://www.cnblogs.com/ms-bk/p/12537487.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有