编程语言
首页 > 编程语言> > 从Python中的任何给定字符串类型中删除非ascii字符

从Python中的任何给定字符串类型中删除非ascii字符

作者:互联网

>>> teststring = 'aõ'
>>> type(teststring)
<type 'str'>
>>> teststring
'a\xf5'
>>> print teststring
aõ
>>> teststring.decode("ascii", "ignore")
u'a'
>>> teststring.decode("ascii", "ignore").encode("ascii")
'a'

这是我真正希望它在内部存储,因为我删除非ascii字符.为什么解码(“ascii发出unicode字符串?

>>> teststringUni = u'aõ'
>>> type(teststringUni)
<type 'unicode'>
>>> print teststringUni
aõ
>>> teststringUni.decode("ascii" , "ignore")

Traceback (most recent call last):
  File "<pyshell#79>", line 1, in <module>
    teststringUni.decode("ascii" , "ignore")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf5' in position 1: ordinal not in range(128)
>>> teststringUni.decode("utf-8" , "ignore")

Traceback (most recent call last):
  File "<pyshell#81>", line 1, in <module>
    teststringUni.decode("utf-8" , "ignore")
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf5' in position 1: ordinal not in range(128)
>>> teststringUni.encode("ascii" , "ignore")
'a'

这又是我想要的.
我不明白这种行为.有人可以向我解释这里发生了什么吗?

编辑:我认为这将是我理解的事情,所以我可以解决我在这里陈述的真正的程序问题:
Converting Unicode objects with non-ASCII symbols in them into strings objects (in Python)

解决方法:

这很简单:.encode将Unicode对象转换为字符串,而.decode将字符串转换为Unicode.

标签:python,replace,unicode,string,non-ascii-characters
来源: https://codeday.me/bug/20190610/1212721.html