编程语言
首页 > 编程语言> > 无法让PythonAnywhere为我刮网

无法让PythonAnywhere为我刮网

作者:互联网

我一直在试验PythonAnywhere试图让一些python在Web服务器上工作.我最初从Arvixe切换,因为他们运行2.4并且PythonAnywhere的名字太吸引人了.

我的应用程序包含两个文件:phones.py和phonesearch.py​​.他们一起应该为电话价格刮取craigslist.

我在2.7本地测试,它运行得很好,生成一个带有表格和所有价格的html页面(celly.html).当我上传它时,它生成html就好了,但拒绝在我的价格列表中添加任何内容([intprices]).

我的怀疑:(a)因为它在本地工作正常,PythonAnywhere不允许它与craigslist通信;或者(b)因为我这样做就像一个穴居人而不是使用微框架,PythonAnywhere否认我;或(c)我对自己的错误视而不见,而且我错过了一些明显的错误.

我的python脚本位于/ home / tseymour / mysite中,而html是在/ home / mysite / static / celly.html生成的.该文件在http://tseymour.pythonanywhere.com/static/celly.html提供

您会注意到我的所有单元格都填充了“N / A”,这意味着它在try:“SearchPhone.py”中引发了一个IndexError.这意味着我的列表正在被填充!

但为什么会这样?!我相信这是因为我是PythonAnywhere n00b.

请指教.

SearchPhone.py

from BeautifulSoup import BeautifulSoup
import urllib
import re

def SearchPhone(phone):

    y = "http://losangeles.craigslist.org/search/moa?query=" + phone + "+-%22buy%22+-%22fix%22+-%22unlock%22+-%22broken%22+-%22cracked%22+-%22parts%22&srchType=T&minAsk=&maxAsk="

    site = urllib.urlopen(y)
    html = site.read()
    site.close()
    soup = BeautifulSoup(html)


    prices = soup.findAll("span", {"class":"itempp"})
    prices = [str(j).strip('<span class="itempp"> $</span>') for j in prices]

    for k in prices[:]:
        if k == '': #left price blank
            prices.remove(k)
        elif int(k) <= 75: #less than $50: probably a service (or not true)
            prices.remove(k)
        elif int(k) >= 999: #probably not true
            prices.remove(k)

    #Find Average Price
    intprices = []
    newprices = prices[:]
    total = 0
    for k in newprices:
        total += int(k)
        intprices.append(int(k))

    intprices = sorted(intprices)

    try:
        del intprices[0]
        del intprices[-1]


        avg = total/len(newprices)
        low = intprices[0]
        high = intprices[-1]

        if len(intprices) % 2 == 1:
            median = intprices[(len(intprices)+1)/2-1]
        else:
            lower = intprices[len(intprices)/2-1]
            upper = intprices[len(intprices)/2]
            median = (float(lower + upper)) / 2



        namestr = str(phone)
        medstr = "Median: $" + str(median)
        avgstr = "Average: $" + str(avg)
        lowstr = "Low: $" + str(intprices[0])
        highstr = "High: $" + str(intprices[-1])
        samplestr = "# of samples: " + str(len(intprices))
        linestr = "-------------------------------"

    except IndexError:
        namestr = str(phone)
        medstr = "N/A"
        avgstr = "N/A"
        lowstr = "N/A"
        highstr = "N/A"
        samplestr = "N/A"
        linestr = "-------------------------------"

    return (namestr, medstr, avgstr, lowstr, highstr, samplestr, linestr)

phones.py

from SearchPhone import SearchPhone

phones = ["Iphone 4", "Iphone 5","Galaxy s3", "Galaxy s2", "LG Lucid", "LG Esteem", "HTC One S", "Droid 4",
          "Droid RAZR MAXX", "HTC EVO", "Galaxy Nexus", "LG Optimus 2", "LG Ignite",
          "Galaxy Note", "HTC Amaze", "HTC Rezound", "HTC Vivid", "HTC Rhyme", "Motorola Photon",
          "Motorola Milestone", "myTouch slide", "HTC Status", "Droid 3", "HTC Evo 3d", "HTC Wildfire",
          "LG Optimus 3d", "HTC ThunderBolt", "Incredible 2", "Kyocera Echo", "Galaxy S 4g",
          "HTC Inspire", "LG Optimus 2x", "Samsung Gem", "HTC Evo Shift", "Nexus S", "LG Axis", "Droid 2",
          "G2", "Droid x", "Droid Incredible"
          ]

f = open('/home/tseymour/mysite/static/celly.html','w')


f.write("""<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Celly Blue Book</title>
</head>

<body>
</body>
</html>
""")

#table
f.write('<table width="100%" border="1">')
for x in phones:
    print "SEarchphone0"
    y = SearchPhone(x)
    print "SEarchphone"
    f.write( "\t<tr>")
    f.write( "\t\t<td>" + str(y[0]) + "</td>")
    f.write( "\t\t<td>" + str(y[1]) + "</td>")
    f.write( "\t\t<td>" + str(y[2]) + "</td>")
    f.write( "\t\t<td>" + str(y[3]) + "</td>")
    f.write( "\t\t<td>" + str(y[4]) + "</td>")
    f.write( "\t</tr>")

f.write('</table>')

f.close()

另外,我确实上传了beautifulsoup

解决方法:

PythonAnywhere dev在这里.你没有说你是否使用免费或付费的PythonAnywhere帐户,但如果它是免费的,那么我认为你正在遇到我们的白名单.对于免费帐户,我们只允许访问一组特定的网站 – 这是因为人们使用我们来做坏事.

我们将网站放在白名单上,以便免费帐户可以使用它们,如果他们有官方可公开访问的API,不幸的是Craigslist没有 – quite the opposite, unfortunately.

如果你注册一个付费帐户,那么你可能会做你想要的,但如果我刚刚链接的文章是正确的,那么你可能想确保你有好律师……

标签:python,pythonanywhere
来源: https://codeday.me/bug/20190826/1725124.html