编程语言
首页 > 编程语言> > 为什么Normalizer :: normallize(PHP)不起作用?

为什么Normalizer :: normallize(PHP)不起作用?

作者:互联网

我正在尝试使用’áéíóú’等字符对字符串进行规范化,以简化搜索.

在响应this question之后,我应该使用Normalizer类来完成它.

问题是normalize函数什么都不做.例如,该代码:

<?php echo 'Pérez, NFC: ' . normalizer_normalize('Pérez', Normalizer::NFC) 
    . ' NFD: ' .normalizer_normalize('Pérez', Normalizer::NFD)
    . ' NFKC: ' .normalizer_normalize('Pérez', Normalizer::NFKC) 
    . ' NFKD: ' .normalizer_normalize('Pérez', Normalizer::NFKD)?>
<br/>
<?php echo 'aáàä, êëéè,' 
    . ' FORM_C: ' . normalizer_normalize('aáàä, êëéè', Normalizer::FORM_C )
    . ' FORM_D: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_D)
    . ' FORM_KC: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_KC)
    . ' FORM_KD: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_KD)?>

说明:

Pérez, NFC: Pérez NFD: Pérez NFKC: Pérez NFKD: Pérez
aáàä, êëéè, FORM_C: aáàä, êëéè FORM_D: aáàä, êëéè FORM_KC: aáàä, êëéè FORM_KD: aáàä, êëéè 

什么应该正常化必须做?

— EDITED —

这很奇怪.从Web浏览器复制并粘贴结果时,在编辑器和原始页面中我可以看到:

FORM_D: aáàä, êëéè

在stackoverflow问题页面中,我可以看到(仅在代码示例模式下):

FORM_D: aáàä, êëéè

解决方法:

发现于this page

Unicode and internationalization is a large topic, but you should know
at least one more important thing. For historical reasons, Unicode
allows alternative representations of some characters. For example, á
can be written either as one precomposed character á with the Unicode
code point U+00E1 or as a decomposed sequence of the letter a (U+0061)
combined with the accent ´ (U+0301). For purposes of comparison and
sorting, two such representations should be taken as equal. To solve
this, the intl library provides the Normalizer class. This class in
turn provides the normalize() method, which you can use to convert a
string to a normalized composed or decomposed form. Your application
should consistently transform all strings to one or the other form
before performing comparisons.

echo Normalizer::normalize("a´, Normalizer::FORM_C); // á  
echo Normalizer::normalize("á", Normalizer::FORM_D); // a´

因此,消除重音(和类似)并不是Normalizer的目的.

标签:php,normalization,intl
来源: https://codeday.me/bug/20190718/1492420.html