数据库
首页 > 数据库> > 全文搜索中的Mysql德语口音不敏感搜索

全文搜索中的Mysql德语口音不敏感搜索

作者:互联网

让我们看一个酒店表示例:

CREATE TABLE `hotels` (
  `HotelNo` varchar(4) character set latin1 NOT NULL default '0000',
  `Hotel` varchar(80) character set latin1 NOT NULL default '',
  `City` varchar(100) character set latin1 default NULL,
  `CityFR` varchar(100) character set latin1 default NULL,
  `Region` varchar(50) character set latin1 default NULL,
  `RegionFR` varchar(100) character set latin1 default NULL,
  `Country` varchar(50) character set latin1 default NULL,
  `CountryFR` varchar(50) character set latin1 default NULL,
  `HotelText` text character set latin1,
  `HotelTextFR` text character set latin1,
  `tagsforsearch` text character set latin1,
  `tagsforsearchFR` text character set latin1,
  PRIMARY KEY  (`HotelNo`),
  FULLTEXT KEY `fulltextHotelSearch` (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`,`HotelText`,`HotelTextFR`,`tagsforsearch`,`tagsforsearchFR`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;

例如,在此表中,我们只有一家酒店,其地区名称=“Graubünden”(请注意umlautü字符)

现在,我想对短语进行相同的搜索匹配:
“ graubunden”和
“graubünden”

使用内置的MySql很简单
常规搜索中的排序规则如下:

SELECT *  
FROM `hotels` 
WHERE `Region` LIKE CONVERT(_utf8 '%graubunden%' USING latin1) 
COLLATE latin1_german1_ci

这对于“ graubunden”和“graubünden”工作正常,并且
结果我收到了适当的结果,但是问题是
当我们进行MySQL全文搜索时

此SQL语句有什么问题?:

SELECT 
 *
FROM 
 hotels 
WHERE 
 MATCH (`HotelNo`,`Hotel`,`Address`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`, `HotelText`, `HotelTextFR`, `tagsforsearch`, `tagsforsearchFR`)
AGAINST( CONVERT('+graubunden' USING latin1)  COLLATE latin1_german1_ci IN BOOLEAN MODE)            
ORDER BY Country ASC, Region ASC, City ASC

这不会返回任何结果.
有什么想法把狗埋在哪里吗?

解决方法:

为列定义单个CHARACTER SETS时,将覆盖在表级别设置为默认的排序规则.

您的每个列都有默认的latin1排序规则(即latin1_swedish_ci).您可以通过运行SHOW CREATE TABLE看到它.

在FULLTEXT查询中,索引列的COERCIBILITY为0,也就是说,所有全文查询都将转换为索引中使用的排序规则,反之亦然.

您需要从列中删除CHARACTER SET定义,或将所有列显式设置为latin1_german_ci:

CREATE TABLE `hotels` (
  `HotelNo` varchar(4) NOT NULL default '0000',
  `Hotel` varchar(80) NOT NULL default '',
  `City` varchar(100) default NULL,
  `CityFR` varchar(100) default NULL,
  `Region` varchar(50) default NULL,
  `RegionFR` varchar(100) default NULL,
  `Country` varchar(50) default NULL,
  `CountryFR` varchar(50) default NULL,
  `HotelText` text,
  `HotelTextFR` text,
  `tagsforsearch` text,
  `tagsforsearchFR` text,
  PRIMARY KEY  (`HotelNo`),
  FULLTEXT KEY `fulltextHotelSearch` (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`,`HotelText`,`HotelTextFR`,`tagsforsearch`,`tagsforsearchFR`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;

INSERT
INTO    hotels (hotelText, HotelTextFR, tagsforsearch, tagsforsearchFR)
VALUES  ('text', 'text', 'graubünden', 'tags');

SELECT  *
FROM    hotels
WHERE   MATCH (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`, `HotelText`, `HotelTextFR`, `tagsforsearch`, `tagsforsearchFR`)
AGAINST (CONVERT('+graubunden' USING latin1) COLLATE latin1_german1_ci IN BOOLEAN MODE)
ORDER BY
        Country ASC, Region ASC, City ASC;

标签:encoding,collation,diacritics,full-text-search,mysql
来源: https://codeday.me/bug/20191210/2099231.html