首页 > 编程语言> > php – mb_encode_numericentity()中$convmap的更好解释

php – mb_encode_numericentity()中$convmap的更好解释

2019-06-23 10:25:50 作者：互联网

php manual中方法mb_encode_numericentity对此参数convmap的描述对我来说是模糊的.有人会帮助更好地解释这一点,或者如果它对我来说足够了可能会“愚蠢”吗？这个参数中使用的数组元素的含义是什么？手册页中的示例1有

<?php
$convmap = array (
 int start_code1, int end_code1, int offset1, int mask1,
 int start_code2, int end_code2, int offset2, int mask2,
 ........
 int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN, then
// it converts value to numeric string reference.
?>

这是有帮助的,但后来我看到很多用法例子,如数组(0x80,0xffff,0,0xffff);这让我失望了.这是否意味着偏移量为0,掩码为0xffff,如果是,则偏移字符串中要开始转换的平均字符数,以及掩码在此上下文中的含义是什么？

解决方法:

向下看rabbit hole,看起来comments in the documentation for mb_encode_numericentity是准确的,虽然有点神秘.

The four major parts to the convmap appear to be:

start_code: The map affects items starting from this character code.
end_code: The map affects items up to this character code.
offset: Add a specific offset amount (positive or negative) for this character code.
mask: Value to be used for mask operation (character code bitwise AND mask value).

字符代码可以通过字符表显示,例如this Codepage Layout example,用于ISO-8859-1编码. (ISO-8859-1是原始PHP文档Example #2中使用的编码.)查看此编码表,我们可以看到convmap仅用于影响从0x80开始的字符代码项(对于此,它似乎是空白的)特殊编码)到这个编码0xff的最后一个字符(似乎是ÿ).

为了更好地理解convmap的偏移和掩模特征,下面是偏移和掩码如何影响字符代码的一些示例(在下面的示例中,我们的字符代码具有162的定义值)：

简单示例：

<?php    
$original_str = "¢";
$convmap = array(0x00, 0xff, 0, 0xff);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original:  $original_str\n";
echo "converted: $converted_str\n";
?>

Result:

06001

偏移量示例：

<?php
$original_str = "¢";
$convmap = array(0x00, 0xff, 1, 0xff);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original:  $original_str\n";
echo "converted: $converted_str\n";
?>

Result:

06003

笔记：

偏移似乎允许对要转换的项目的当前start_code和end_code部分进行更精细的控制.例如,您可能有一些特殊原因需要为convmap中的某一行字符代码添加偏移量,但是您可能需要忽略convmap中另一行的偏移量.

面具示例：

<?php
// Mask Example 1
$original_str = "¢";
$convmap = array(0x00, 0xff, 0, 0xf0);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original:  $original_str\n";
echo "converted: $converted_str\n\n";

// Mask Example 2
$convmap = array(0x00, 0xff, 0, 0x0f);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original:  $original_str\n";
echo "converted: $converted_str\n\n";

// Mask Example 3
$convmap = array(0x00, 0xff, 0, 0x00);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original:  $original_str\n";
echo "converted: $converted_str\n";
?>

Result:

06005

笔记：

这个答案并不打算涵盖masking in great detail,但屏蔽可以帮助keep or remove certain bits从给定的值.

面具示例1

因此,在第一个掩码示例0xf0中,f表示我们希望将值保留在二进制值的左侧.这里,f的二进制值为1111,0的二进制值为0000,一起变为值11110000.

然后,当我们使用我们的字符代码(在这种情况下,162,其二进制值为10100010)进行按位AND运算时,按位运算如下所示：

  11110000
& 10100010
----------
  10100000

当转换回十进制值时,10100000为160.

因此,我们有效地保留了原始字符代码值的“左侧”位,并且已经摆脱了位的“右侧”.

面具示例2

在第二个掩码示例中,按位AND运算中的掩码0x0f(二进制值为00001111)将具有以下二进制结果：

  00001111
& 10100010
----------
  00000010

当转换回十进制值时,为2.

因此,我们有效地保留了原始字符代码值的“右侧”位,并且已经摆脱了位的“左侧”.

面具实例3

最后,第三个掩码示例显示在按位AND操作中使用0x00掩码(二进制为00000000)时会发生什么：

  00000000
& 10100010
----------
  00000000

结果为0.

标签：php,html-encode,html-entities,collation
来源： https://codeday.me/bug/20190623/1269939.html