找出段落中每个字母 构造哈夫曼树
作者:互联网
过程
算出字母的频率
计算出每个英文字母出现的概率,无论大小写,忽略其他字符。并以概率作为权重。
-
利用Java的Hash Map数据类型计算出次数。
-
hash map是基于哈希表的 Map 接口的实现。hash map有两个元素一个是key(键名),一个是value(键值)。实现的原理:将这段文字中出现的字母,作为键值(key),出现的次数作为键值(value),hash map中的键名是不能重复的,那么统计这些字母数量,就变成了统计这些相同键名得数量。我的实现方式是从第一个字母开始,把字母存到哈希表去,第一个就是a :1,然后按顺序存第二个字母f :1,如果出现跟前面有的字母重复的话,前面的字母键值就 + 1,例如aba,遍历到第3个字母的时候,a的键值就变成2,就是a :2了,如果遇到空格数字,标点符号,及其他特殊字符就排除掉,不插入,如果标点符号也统计则不需要判断此条件,全部都插入即可。以此类推其他字母也是一样的。
-
import java.util.HashMap; public class Hashmap{ public static void main(String[] args){ HashMap <String, Integer> map = new HashMap <String, Integer> (); String str = "Effificient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectiffied Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We fifirst systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplififies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation"; for(int i = 0; i < str.length(); i++){ char ch = str.charAt(i); String key = String.valueOf(ch); if(map.containsKey(key)){ //Integer value = map.get(key).intValue(); Integer value = map.get(key); map.put(key, value + 1); }else{ // map.put(key, 1);可以统计所有字符,包括中文 //利用ascii码去除字符串中的数字,空格,标点符号,特殊字符。限定只统计标点符号 if(ch >= 'A' && ch <= 'Z' || ch >= 'a' && ch <= 'z'){ map.put(key, 1); } } } System.out.println(map); } }
-
结果
{B=1, C=2, E=1, L=3, M=1, N=4, R=2, T=2, W=3, a=52, b=3, c=21, d=16, e=54, f=27, g=10, h=15, i=55, k=4, l=36, m=22, n=41, o=41, p=10, r=31, s=53, t=48, u=14, v=3, w=8, y=13}
-
人力统计 算出各字母出现的概率
字母 次数 概率 V 3 0.006 B 4 0.007 K 4 0.007 G 10 0.018 P 10 0.018 W 11 0.020 Y 13 0.024 U 14 0.026 H 15 0.028 D 16 0.029 C 23 0.042 M 23 0.042 F 27 0.050 R 33 0.061 L 39 0.072 O 41 0.076 N 45 0.083 T 50 0.092 A 52 0.096 E 55 0.101 I 55 0.101
-
构造哈夫曼树
根据哈夫曼树构造原理构造出哈夫曼树:规定小的结点在左边,大的在右边。左边的编码为‘0’,右边的编码为‘1’。
代码以及结果
代码
#include <stdio.h>
#include <stdlib.h>
#define MAXBIT 50
#define MAXLEAF 50
#define MAXNODE 2 * MAXLEAF - 1
#define INFINITY 65535
//编码结构体
typedef struct HCodeType{
int start;
int bit[MAXBIT];
}HCodeType;
//结点结构体
typedef struct HNode{
int parent, lchild, rchild;
double weight;
char data;
}HNode;
//构造哈夫曼树
void HuffmanTree(HNode HN[MAXNODE], int n){
int i = 0, j;
int x1, x2;
double s1, s2;
char ch;
printf("请输入各种字母\n");
//初始化哈夫曼树的叶子结点
while(i < n){
HN[i].parent = -1;
HN[i].lchild = -1;
HN[i].rchild = -1;
HN[i].weight = 0;
scanf("%c", &ch);
scanf("%c", &HN[i].data);
i++;
}
printf("请依次输入字母的权重\n");
i = 0;
while(i < n){
scanf("%lf", &HN[i].weight);
i++;
}
//初始化哈夫曼树中其他结点
for(i = n; i < 2 * n - 1; i++){
HN[i].parent = -1;
HN[i].lchild = -1;
HN[i].rchild = -1;
HN[i].weight = 0;
HN[i].data = '0';
}
//选出最小权重的两个结点
for(i = 0; i < n - 1; i++){
s1 = s2 = INFINITY;
x1 = x2 = 0;
for(j = 0; j < n + i; j++){
if(HN[j].weight < s1 && -1 == HN[j].parent){
s2 = s1;
x2 = x1;
s1 = HN[i].weight;
x1 = j;
}else if(HN[j].weight < s2 && -1 == HN[j].parent){
s2 = HN[j].weight;
x2 = j;
}
}
HN[x1].parent = n + i;
HN[x2].parent = n + i;
HN[n + i].weight = HN[x1].weight + HN[x2].weight;
HN[n + i].lchild = x1;
HN[n + i].rchild = x2;
}
}
int main(){
HNode HN[MAXNODE];
HCodeType HC[MAXLEAF], cd;
int i, j, k, p;
int n;
printf("请输入字母的总数\n");
scanf("%d", &n);
HuffmanTree(HN, n);
//自下而上获取编码,逆序存入
for(i = 0 ; i < n; i++){
cd.start = n - 1;
k = i;
p = HN[k].parent;
while(p != -1){
if(k == HN[p].lchild){
cd.bit[cd.start] = 0;
}else{
cd.bit[cd.start] = 1;
}
cd.start--;
k = p;
p = HN[k].parent;
}
for(j = cd.start + 1; j < n; j++){
HC[i].bit[j] = cd.bit[j];
HC[i].start = cd.start;
}
}
//输出编码
for(i = 0; i < n; i++){
printf("%c ", HN[i].data);
for(j = HC[i].start + 1; j < n; j++){
printf("%d", HC[i].bit[j]);
}
printf("\n");
}
return 0;
}
运行结果
标签:段落,map,哈夫曼,weight,++,字母,cd,HN 来源: https://blog.csdn.net/triggerV/article/details/118106224