首页 > 其他分享> > IK分词器使用自定义词库

IK分词器使用自定义词库

2022-08-17 14:32:28 作者：互联网

IK分词器，支持两种算法。分别为：

ik_smart ：最少切分
ik_max_word ：最细粒度切分

但是我们想要“最好听的歌”为一个完整的词，但是结果并没有，这个时候需要我们去词库添加这个词。

1.在es的插件目录中，我们添加了IK分词器，在分词器目录下，有个config目录，

/plugins/ik/config

在congif中，添加一个mydic.dic的文件，名字随意，后缀为dic；

2.在mydic.dic文件中添加词汇：

最好听的歌

3.保存后，修改在ik/config目录的IKAnalyzer.cfg.xml文，内容：

4.重启es；如果是es集群，每个节点都需要改；

测试下：

ik_smart：

{
	"analyzer":"ik_smart",
	"text":"最好听的歌"	
}
输出：
{
    "tokens": [
        {
            "token": "最好听的歌",
            "start_offset": 0,
            "end_offset": 5,
            "type": "CN_WORD",
            "position": 0
        }
    ]
}

ik_max_word：

{
	"analyzer":"ik_max_word",
	"text":"最好听的歌"	
}
输出：
{
    "tokens": [
        {
            "token": "最好听的歌",
            "start_offset": 0,
            "end_offset": 5,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "最好",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "好听",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "听的歌",
            "start_offset": 2,
            "end_offset": 5,
            "type": "CN_WORD",
            "position": 3
        }
    ]
}

标签：end,CN,自定义词,IK,听的歌,start,ik,分词器,offset
来源： https://www.cnblogs.com/xudong5273/p/16595049.html