为什么Java char使用UTF-16?
作者:互联网
最近我读了很多关于unicode代码点以及它们如何随着时间的推移而演变的事情,并确定我也阅读了http://www.joelonsoftware.com/articles/Unicode.html.
但是我无法找到Java使用UTF-16作为char的真正原因.
例如,如果我的字符串包含1024个字母的ASCII范围charachter字符串.它意味着1024 * 2字节,相当于它将消耗的2KB字符串内存.
因此,如果Java base char是UTF-8,那么它只有1KB的数据.即使字符串具有任何需要2字节的字符串,例如10字符“字符”自然也会增加内存消耗的大小. (1014 * 1字节)(10 * 2字节)= 1KB 20字节
结果不是那个明显的1KB 20字节VS. 2KB我不是说关于ASCII而是我对此的好奇心,为什么它不是UTF-8,它只是照顾多字节字符. UTF-16在任何具有大量非多字节字符的字符串中看起来像浪费内存.
这背后有什么好理由吗?
解决方法:
Java在2004/2005转换为UTF-16之前使用了UCS-2.原始选择UCS-2的原因是mainly historical:
Unicode was originally designed as a fixed-width 16-bit character encoding. The primitive data type char in the Java programming language was intended to take advantage of this design by providing a simple data type that could hold any character.
这和UTF-16的诞生是explained by the Unicode FAQ page:
Originally, Unicode was designed as a pure 16-bit encoding, aimed at representing all modern scripts. (Ancient scripts were to be represented with private-use characters.) Over time, and especially after the addition of over 14,500 composite characters for compatibility with legacy sets, it became clear that 16-bits were not sufficient for the user community. Out of this arose UTF-16.
由于@wero有already mentioned,使用UTF-8无法有效地进行随机访问.所有事情都在衡量,UCS-2似乎是当时最好的选择,特别是因为那个阶段没有分配补充字符.这使得UTF-16成为最简单的自然进展.
标签:java,unicode,utf-16,utf-8 来源: https://codeday.me/bug/20190927/1824802.html