python-使用电子邮件库奇怪地解码了从移动设备发送的电子邮件
作者:互联网
我正在使用Python imaplib和电子邮件模块从smtp获取电子邮件列表,然后对它们进行一些处理.这是我用来抓取和解码电子邮件的代码段:
import imaplib
import email
# Connect to server
box = imaplib.IMAP4(CSMTP_SERVER)
box.login(CSMTP_USERNAME, CSMTP_PASSWORD)
# List inbox
box.select('INBOX')
# Retrieve email list ID's matching search patterns
# Return from search is this:
# ('OK', ['1 2 3 4 5 6 7 8 9 10 11 12 13 14'])
data = box.search(None, 'ALL')[1]
for num in data[0].split():
# Retrieve message headers and body
headers = email.message_from_string(box.fetch(num, '(RFC822)')[1][0][1])
body = headers.get_payload()
if not isinstance(body, str):
body = headers.get_payload()[0].get_payload()
print headers, body
当从Hotmail或Gmail发送电子邮件时,这就像一种魅力,但是每当例如从Android默认邮件应用程序发送电子邮件时,该消息将如下所示:
=?utf-8?B?RndkOiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z?
U2VudCBmcm9tIG15IEhUQwoKLS0tLS0gRm9yd2FyZGVkIG1lc3NhZ2UgLS0tLS0KRnJvbTogIkFs
ZXhhbmRlciBBdnRhbnNraSIgPGFsZXhAYXZ0YW5za2kuY29tPgpUbzogIlBlam1hbiBNYWtoZmki
IDxwakBtYWtoZmkuY29tPgpTdWJqZWN0OiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z
CkRhdGU6IFdlZCwgU2VwIDEwLCAyMDE0IDk6MDYgUE0KCkhpIFBlam1hbiwKCkkgd2FzIHBsYXlp
bmcgd2l0aCBDYXBzaGFyZSB0b2RheSBhbmQgZm91bmQgc29tZXRoaW5nIG1pc3NpbmcuIEkgZ3Vl
c3MgeW91CmhhdmUgcGxhbnMgZm9yIGl0LCBidXQgaXQgZG9lc24ndCBodXJ0IHRvIG1lbnRpb24g
aXQsIGp1c3Qgb24gY2FzZS4uLgoKV2hlbiBpbXBvcnRpbmcgcGhvdG9zLCBJIGhhdmUgdGhlIG9w
dGlvbiB0byBlaXRoZXIgZ2V0IG9uZSBvZiB0aGUgaW1hZ2VzCnRoYXQgYXJlIGRvd25sb2FkZWQg
b24gbXkgcGhvbmUsIG9yIHRvIHRha2UgYSBuZXcgcGljdHVyZS92aWRlby4gV2hhdCdzCm1pc3Np
bmcgaXMgYWJpbGl0eSB0byBnZXQgcGhvdG9zIGZyb20gbXkwcyBJJ3ZlIHVzZWQgZG9uJ3Qg
Y2FyZSB3aGVyZSB0aGUgcGhvdG8gaXMgbG9jYXRlZCBhbmQgYWxsCnBpY3R1cmVzIGFyZSBlcXVh
bGx5IGFjY2Vzc2libGUgKG9yIG1heWJlIHRoaXMgYXBwbGllcyBvbmx5IHRvIEdvb2dsZQphcHBz
PykuCgpOb3QgaW1wb3J0YW50LCBubyBpZGVhIGlmIGl0IGlzIGp1c3QgYSBsaW5lIG9yIHR3byBm
aXggb3Igc29tZXRoaW5nIG1vcmUKY29tcGxpY2F0ZWQuCgpUYWtlIGNhcmUsCgotIEFsZXgsIGJl
dGEgdGVzdGVyLCBRQSB2b2x1bnRlZXIsIGFuZCBzZW5pb3IgcGVza3kgc3RpY2tsZXI=
收到此消息后,我就是从移动设备发送电子邮件.我怀疑这是有必要做的,更像是一些电子邮件发送程序无法基于RFC822正确构建电子邮件的标头,但我需要以某种方式解决此问题并能够检索每封电子邮件.
我将对如何处理此问题提供一些提示.提前致谢.
解决方法:
这是一个MIME消息-在RFC822中未指定,但在较新的2045-2047中指定.
绝大多数现代电子邮件都以某种方式使用MIME,因此您绝对应该支持它.
与该消息特别相关的是RFC2047,它指定了编码字.有一个good overview on wikipedia,我将部分抄写:
The form is: “=?charset?encoding?encoded text?=”.
encoding can be either “Q” denoting Q-encoding that is similar to the quoted-printable encoding, or “B” denoting base64 encoding.
因此,对于此特定消息,您具有Base64编码(B)utf-8编码的文本.实际消息从B?之后开始,而不是第二行.
这是一些简单的python代码来处理所有这些事情:
if body.startswith("=?"):
i1= body.index("?")
i2= body.index("?", i1+1)
i3= i2+2
encoding= body[i1+1:i2]
assert body[i2:i3]=="?B" #don't handle Q format, it's not commonly used
body= base64.b64decode(body[i3+1:]).decode(encoding)
标签:rfc822,encoding,email,imaplib,python 来源: https://codeday.me/bug/20191121/2050420.html