其他分享
首页 > 其他分享> > pypinyin

pypinyin

作者:互联网

目录

汉字编码

pypinyin

pip install pypinyin
项目地址: https://github.com/mozillazg/python-pinyin
文档:
https://pypinyin.readthedocs.io/zh_CN/master/index.html
https://fishc.com.cn/thread-147575-1-1.html

单字拼音预测有误/远古多音字问题:

pypinyin.pinyin('重',heteronym=True) # [['zhòng', 'chóng', 'tóng']] tóng?
https://github.com/mozillazg/python-pinyin/issues/263
https://github.com/mozillazg/python-pinyin/issues/198

解决方法:替换默认字库
pip install pypinyin-dict

from pypinyin_dict.pinyin_data import cc_cedict
cc_cedict.load()

通过拼音索引汉字

import pypinyin as pypy
from pypinyin_dict.pinyin_data import cc_cedict
import json

cc_cedict.load()


def gb2312():
    s = []
    # 一级汉字3755个
    # for i in range(176, 216):
    #     for j in range(161, 255):
    for i in range(0xB0, 0xF7 + 1):
        for j in range(0xA1, 0xFE + 1):
            try:
                c = (bytes([i]) + bytes([j])).decode('gb2312')
            except:
                # print(i, j, 'error')
                continue
            # print(c, end=' ')
            s.append(c)
    print(len(s))  # 6763
    return s


hanzis = [chr(i) for i in range(0x4E00, 0x9FA6)]  # unicode 20902个汉字
hanzis = gb2312()  # gb2312 6737个汉字

dict = {}
for hanzi in hanzis:  #'重'
    pinyins = pypy.pinyin(hanzi, style=pypy.NORMAL, heteronym=True)[0]
    for pinyin in pinyins:
        if pinyin not in dict:
            dict[pinyin] = [hanzi]
        else:
            dict[pinyin].append(hanzi)
# print(dict)

# cc_cedict字库 , gb2312汉字
print(dict['qun'])  # ['裙', '群', '逡', '麇']
print(dict['zhi'])


'''
# 写入文件
with open('dict.json', 'w') as f:  # dict转josn
    json.dump(dict, f)

# 读取json
load_dict = []
with open('dict.json', 'r') as f:
    load_dict = json.load(f)
print(load_dict['cai'])
'''

标签:编码,pinyin,汉字,dict,pypinyin,print
来源: https://www.cnblogs.com/wqzz/p/16151753.html