python - PythonAnywhere で Web をスクレイピングする際に問題が発生しました

Question

私はPythonAnywhereで実験を行っており、PythonをWebサーバーで動作させようとしています。Arvixe は 2.4 を実行していて、PythonAnywhere の名前があまりにも魅力的だったので、最初は Arvixe から切り替えました。

私のアプリケーションは、phones.py と phonesearch.py の 2 つのファイルで構成されています。一緒に、電話の価格を求めて craigslist をスクレイピングすることになっています。

2.7 でローカルにテストしたところ、問題なく動作し、表とすべての価格を含む html ページ (celly.html) が生成されました。アップロードすると、html は問題なく生成されますが、価格リスト ([intprices]) に何も追加することを拒否します。

私の疑い: (a) ローカルでは正常に動作するため、PythonAnywhere は craigslist との通信を許可していません。または（b）私は穴居人のようにこれを行っており、マイクロフレームワークを使用していないため、PythonAnywhereは私を拒否しています。または (c) 私は自分の誤りに目がなく、明らかな何かを見落としています。

私の python スクリプトは /home/tseymour/mysite にあり、html は同じ/mysite/static/celly.html に生成されます。ファイルはhttp://tseymour.pythonanywhere.com/static/celly.htmlで提供されます

私のセルのすべてが「N/A」で満たされていることがわかります。これは、SearchPhone.py の try: で IndexError が発生したことを意味します。これは、私のリストがいっぱいになっていることを意味します。

しかし、それはなぜですか?! それは、私が PythonAnywhere n00b だからだと思います。

お知らせ下さい。

SearchPhone.py

from BeautifulSoup import BeautifulSoup
import urllib
import re

def SearchPhone(phone):

    y = "http://losangeles.craigslist.org/search/moa?query=" + phone + "+-%22buy%22+-%22fix%22+-%22unlock%22+-%22broken%22+-%22cracked%22+-%22parts%22&srchType=T&minAsk=&maxAsk="

    site = urllib.urlopen(y)
    html = site.read()
    site.close()
    soup = BeautifulSoup(html)


    prices = soup.findAll("span", {"class":"itempp"})
    prices = [str(j).strip('<span class="itempp"> $</span>') for j in prices]

    for k in prices[:]:
        if k == '': #left price blank
            prices.remove(k)
        elif int(k) <= 75: #less than $50: probably a service (or not true)
            prices.remove(k)
        elif int(k) >= 999: #probably not true
            prices.remove(k)

    #Find Average Price
    intprices = []
    newprices = prices[:]
    total = 0
    for k in newprices:
        total += int(k)
        intprices.append(int(k))

    intprices = sorted(intprices)

    try:
        del intprices[0]
        del intprices[-1]


        avg = total/len(newprices)
        low = intprices[0]
        high = intprices[-1]

        if len(intprices) % 2 == 1:
            median = intprices[(len(intprices)+1)/2-1]
        else:
            lower = intprices[len(intprices)/2-1]
            upper = intprices[len(intprices)/2]
            median = (float(lower + upper)) / 2



        namestr = str(phone)
        medstr = "Median: $" + str(median)
        avgstr = "Average: $" + str(avg)
        lowstr = "Low: $" + str(intprices[0])
        highstr = "High: $" + str(intprices[-1])
        samplestr = "# of samples: " + str(len(intprices))
        linestr = "-------------------------------"

    except IndexError:
        namestr = str(phone)
        medstr = "N/A"
        avgstr = "N/A"
        lowstr = "N/A"
        highstr = "N/A"
        samplestr = "N/A"
        linestr = "-------------------------------"

    return (namestr, medstr, avgstr, lowstr, highstr, samplestr, linestr)

電話.py

from SearchPhone import SearchPhone

phones = ["Iphone 4", "Iphone 5","Galaxy s3", "Galaxy s2", "LG Lucid", "LG Esteem", "HTC One S", "Droid 4",
          "Droid RAZR MAXX", "HTC EVO", "Galaxy Nexus", "LG Optimus 2", "LG Ignite",
          "Galaxy Note", "HTC Amaze", "HTC Rezound", "HTC Vivid", "HTC Rhyme", "Motorola Photon",
          "Motorola Milestone", "myTouch slide", "HTC Status", "Droid 3", "HTC Evo 3d", "HTC Wildfire",
          "LG Optimus 3d", "HTC ThunderBolt", "Incredible 2", "Kyocera Echo", "Galaxy S 4g",
          "HTC Inspire", "LG Optimus 2x", "Samsung Gem", "HTC Evo Shift", "Nexus S", "LG Axis", "Droid 2",
          "G2", "Droid x", "Droid Incredible"
          ]

f = open('/home/tseymour/mysite/static/celly.html','w')


f.write("""<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Celly Blue Book</title>
</head>

<body>
</body>
</html>
""")

#table
f.write('<table width="100%" border="1">')
for x in phones:
    print "SEarchphone0"
    y = SearchPhone(x)
    print "SEarchphone"
    f.write( "\t<tr>")
    f.write( "\t\t<td>" + str(y[0]) + "</td>")
    f.write( "\t\t<td>" + str(y[1]) + "</td>")
    f.write( "\t\t<td>" + str(y[2]) + "</td>")
    f.write( "\t\t<td>" + str(y[3]) + "</td>")
    f.write( "\t\t<td>" + str(y[4]) + "</td>")
    f.write( "\t</tr>")

f.write('</table>')

f.close()

また、念のためbeautifulsoupをアップロードしました

score 6 · Accepted Answer

PythonAnywhere 開発者はこちら。無料または有料の PythonAnywhere アカウントを使用しているかどうかはわかりませんが、無料の場合はホワイトリストに違反していると思います. 無料アカウントの場合、特定の一連の Web サイトへのアクセスのみを許可します。これは、人々が私たちを使って悪いことをしていたためです。

私たちはサイトをホワイトリストに載せて、無料のアカウントが公にアクセス可能な公式 API を持っている場合にそれらを使用できるようにします。残念ながら、Craigslist には API がありません。残念ながら、まったく逆です。

有料アカウントにサインアップすれば、おそらくやりたいことができるでしょうが、リンクしたばかりの記事が正しい場合は、優れた弁護士を確保する必要があるかもしれません...

python - PythonAnywhere で Web をスクレイピングする際に問題が発生しました

1 に答える 1

Related

Reference