编程语言
首页 > 编程语言> > Python响应解码

Python响应解码

作者:互联网

对于使用urllib的以下行:

# some request object exists
response = urllib.request.urlopen(request)
html = response.read().decode("utf8")

read()返回什么格式的字符串?我一直试图从Python的文档中找到它,但它根本没有提到它.为什么要解码?解码是否将对象解码为utf-8或utf-8?从什么格式到它将它解码为什么格式?解码文档也没有提到这一点.是Python的文档是那么可怕,还是我不理解某些标准约定?

我想将该HTML存储在UTF-8文件中.我会做一个常规的写作,还是我需要“编码”回某些东西然后写出来?

注意:我知道urllib已被弃用,但我现在无法切换到urllib2

解决方法:

问python:

>>> r=urllib.urlopen("http://google.com")
>>> a=r.read()
>>> type(a)
0: <type 'str'>
>>> help(a.decode)
Help on built-in function decode:

decode(...)
    S.decode([encoding[,errors]]) -> object

    Decodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
    as well as any other name registered with codecs.register_error that is
    able to handle UnicodeDecodeErrors.

>>> b = a.decode('utf8')
>>> type(b)
1: <type 'unicode'>
>>> 

所以,似乎read()返回一个str. .decode()从UTF-8解码为Python的内部unicode格式.

标签:python,html-encode,utf-8,urllib,decode
来源: https://codeday.me/bug/20190709/1410762.html