编程语言
首页 > 编程语言> > Python电子邮件引用 – 可打印编码问题

Python电子邮件引用 – 可打印编码问题

作者:互联网

我使用以下方法从Gmail中提取电子邮件:

def getMsgs():
 try:
    conn = imaplib.IMAP4_SSL("imap.gmail.com", 993)
  except:
    print 'Failed to connect'
    print 'Is your internet connection working?'
    sys.exit()
  try:
    conn.login(username, password)
  except:
    print 'Failed to login'
    print 'Is the username and password correct?'
    sys.exit()

  conn.select('Inbox')
  # typ, data = conn.search(None, '(UNSEEN SUBJECT "%s")' % subject)
  typ, data = conn.search(None, '(SUBJECT "%s")' % subject)
  for num in data[0].split():
    typ, data = conn.fetch(num, '(RFC822)')
    msg = email.message_from_string(data[0][1])
    yield walkMsg(msg)

def walkMsg(msg):
  for part in msg.walk():
    if part.get_content_type() != "text/plain":
      continue
    return part.get_payload()

但是,我收到的一些电子邮件几乎不可能从编码相关的字符(例如’=’)中提取日期(使用正则表达式),随机落在各种文本字段的中间.这是一个例子,它出现在我想要提取的日期范围内:

Name: KIRSTI Email:
kirsti@blah.blah Phone #: + 999
99995192 Total in party: 4 total, 0
children Arrival/Departure: Oct 9=
,
2010 – Oct 13, 2010 – Oct 13, 2010

有没有办法删除这些编码字符?

解决方法:

您可以/应该使用email.parser模块来解码邮件消息,例如(快速和脏的示例!):

from email.parser import FeedParser
f = FeedParser()
f.feed("<insert mail message here, including all headers>")
rootMessage = f.close()

# Now you can access the message and its submessages (if it's multipart)
print rootMessage.is_multipart()

# Or check for errors
print rootMessage.defects

# If it's a multipart message, you can get the first submessage and then its payload
# (i.e. content) like so:
rootMessage.get_payload(0).get_payload(decode=True)

使用Message.get_payload的“decode”参数,模块会根据其编码自动解码内容(例如,问题中引用的printables).

标签:python,encoding,email,imaplib
来源: https://codeday.me/bug/20190526/1157764.html