编程语言
首页 > 编程语言> > python-对字符向量进行排序时的结果不同

python-对字符向量进行排序时的结果不同

作者:互联网

我想知道在对字符向量进行排序时R排序算法如何工作

a = c("aa(150)", "aa(1)S")
sort(a)
# [1] "aa(150)" "aa(1)S" 
a = c("aa(150)", "aa(1)")
sort(a)
# [1] "aa(1)" "aa(150)"

R不会从左到右一一比较字符的整数值吗?为什么添加字符可以改变结果?

我认为排序由“ 5”和“)”字符决定,之后的字符将被忽略.

与Python比较

In [1]: a=["aa(150)","aa(1)"]
In [2]: sorted(a)
Out[2]: ['aa(1)', 'aa(150)']
In [3]: a=["aa(150)","aa(1)S"]
In [4]: sorted(a)
Out[4]: ['aa(1)S', 'aa(150)']

解决方法:

在大多数情况下,将语言环境设置为默认设置,它将关闭特定于语言环境的排序:

Sys.setlocale("LC_COLLATE", "C")
a=c("aa(150)","aa(1)S")
sort(a)
#[1] "aa(1)S"  "aa(150)"

由于语言差异,字符串排序规则必须是国际特定的.从帮助?排序:

The sort order for character vectors will depend on the collating
sequence of the locale in use: see Comparison.

然后,我们可以转到?Comparsons进行以下比较:

Comparison of strings in character vectors is lexicographic within the
strings using the collating sequence of the locale in use: see
locales. The collating sequence of locales such as en_US is normally
different from C (which should use ASCII) and can be surprising.
Beware of making any assumptions about the collation order: e.g. in
Estonian Z comes between S and T, and collation is not necessarily
character-by-character – in Danish aa sorts as a single letter, after
z. In Welsh ng may or may not be a single sorting unit: if it is it
follows g.

如前所述,由于每种语言以不同的方式使用字母,因此语言环境对于排序至关重要.

标签:character,python,r,sorting
来源: https://codeday.me/bug/20191118/2030667.html