编程语言
首页 > 编程语言> > python-屏幕抓取:处理POST登录

python-屏幕抓取:处理POST登录

作者:互联网

我刚开始使用屏幕抓取功能,并且正在尝试自动登录银行.我认为我基本上可以做到以下几点:

>使用银行网页的源代码,一些工具和一些聪明的黑客工具,确定将登录数据发布到何处以及如何格式化.
>用Python来实现.
>世界统治.

到目前为止,我已经进入了步骤2.这是我的Python代码:

#!/usr/bin/python

import urllib, argparse, sys, re

def main():
    parser = argparse.ArgumentParser(description="Attempt to log into a Mission Federal Bank Account")
    parser.add_argument("-u", "--username", required=True, dest="username")
    parser.add_argument("-p", "--password", required=True, dest="password")
    arguments = parser.parse_args(sys.argv[1:])

    post = {
        'user': arguments.username,
        'PIN': arguments.password,
        'TestJavaScript': "OK",
        'signonDest': "My Default Destination"
    }

    post_encoded = urllib.urlencode(post)

    success_test = re.compile("<title id=\"HTMLTITLE\">Account Summary</title>")

    result = urllib.urlopen("https://missionlink.missionfcu.org/MFCU/login.aspx", post_encoded)
    result_string = result.read()

    success = success_test.match(result_string)

    if success == True:
        print "Login Successful *devilish laugh*"
    else:
        print "Login Failed"
        print result_string

    return

if __name__ == "__main__":
    main()

如您所见,这确实非常简单.我想,我需要的只是一个URL(检查)和正确的POST参数(检查).但是,银行不接受我的请求,也无法登录.我确定我的方法正确,可以通过Firefox TamperData扩展捕获POST请求和响应.这是实际的浏览器生成的POST请求的经过清理的转储(从浏览器完成后,这是可行的):

22:52:22.172[5239ms][total 5239ms] Status: 302[Found]
POST https://missionlink.missionfcu.org/MFCU/login.aspx Load Flags[LOAD_DOCUMENT_URI  LOAD_INITIAL_DOCUMENT_URI  ] Content Size[178] Mime Type[text/html]
    Request Headers:
      Host[missionlink.missionfcu.org]
      User-Agent[Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.16) Gecko/20110323 Ubuntu/10.04 (lucid) Firefox/3.6.16]
      Accept[text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
      Accept-Language[en-us,en;q=0.5]
      Accept-Encoding[gzip,deflate]
      Accept-Charset[ISO-8859-1,utf-8;q=0.7,*;q=0.7]
      Keep-Alive[115]
      Connection[keep-alive]
      Referer[https://www.missionfed.com/]
   Post Data:
      user[USERNAME]
      PIN[PASSWORD]
      TestJavaScript[OK]
      signonDest[My+Default+Destination]
   Response Headers:
      Date[Fri, 08 Apr 2011 05:52:38 GMT]
      Server[Microsoft-IIS/6.0]
      PICS-Label[(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (v 0 s 0 n 0 l 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (v 0 s 0 n 0 l 0))(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l on "2008.11.21T16:39-0800" exp "2015.11.21T12:00-0800" r (l 0 s 0 v 0 o 0))]
      X-Powered-By[ASP.NET]
      X-AspNet-Version[1.1.4322]
      Location[https://missionlink.missionfcu.org/MFCU/Accounts/Summary.aspx]
      Set-Cookie[ASP.NET_SessionId=0o5zkh55rrost555z0m3xs55; path=/
TestCookie=OK; expires=Fri, 15-Apr-2011 12:52:37 GMT; path=/
AuthenticationTicket=6F527794FC5C8DAA18B6BA2E77E19DA5A256C092B0879D3CA68C111E52338F441690B94E652AC57FDBEEFD613367C076AB0EC7FA515E4CEC67C5F86B4B625D9B233B0D1B35BB0C58AE4B7CE6D6614CD0F732918E51E3B7939F284D9586B9CB132A12F3717BF80581F58440D91256D1438349E10867618F3300290C3AE7AA436572188236727041B93BD3C8C90E6F67915942FCC25CDD31C9D4F7D1C5F8A29E7C9A58825C3928F32C91146CC7BE47E86F0551CF1550EF21585C92F6C6AA245EE4D7CC5E80C4EFEB29A9572E625F79E709CA50BBF24303CE5AF06664C8784C2CDFA52CF7B6441170D4B3C5B8D4B7E6582B6072BAF7; path=/]
      Cache-Control[no-cache, no-store]
      Pragma[no-cache]
      Expires[-1]
      Content-Type[text/html; charset=utf-8]
      Content-Length[178]

我似乎无法确定我在这里缺少什么.显然,AuthenticationTicket cookie发生了某些事情,但这不是响应的一部分,而不是请求的一部分吗?同样,我对屏幕抓取有些陌生,所以请多多包涵.关于我在做什么错的任何想法吗?

解决方法:

对于诸如此类的复杂浏览器自动化,签出mechanize可能对您很有用.

另外,您听说过Charles Proxy吗?它本质上类似于Wireshark,但是是为Web开发量身定制的,我怀疑它会在开发过程中极大地帮助您.

标签:screen-scraping,python
来源: https://codeday.me/bug/20191208/2092230.html