python - Webスクレイピング用のPythonコードでしばらく作業した後のエラー

Question

Python 2.7 (アイドル状態) と美しいスープを使用して、フリップカートサイトからすべてのモバイルデータをかき集めようとしています。以下は私のコードです。私のコードの最初の部分では、すべてのサムスンのモバイルの個々のリンクをすべて取得しており、2 番目の部分では、それらのそれぞれのページからすべてのモバイル仕様 (td 要素) をスクレイピングしています。しかし、いくつかの携帯電話の後、次のエラーメッセージが表示されます

 ================================
>>> 

Traceback (most recent call last):
  File "E:\data base python\collectinghrefsamasungstack.py", line 16, in <module>
    htmlfile = urllib.urlopen(url)  #//.request is in 3.0x
  File "C:\Python27\lib\urllib.py", line 87, in urlopen
    return opener.open(url)
  File "C:\Python27\lib\urllib.py", line 208, in open
    return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 345, in open_http
    h.endheaders(data)
  File "C:\Python27\lib\httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 829, in _send_output
    self.send(msg)
  File "C:\Python27\lib\httplib.py", line 791, in send
    self.connect()
  File "C:\Python27\lib\httplib.py", line 772, in connect
    self.timeout, self.source_address)
  File "C:\Python27\lib\socket.py", line 571, in create_connection
    raise err
IOError: [Errno socket error] [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

私のコード

    import urllib
    import re  
    from bs4 import BeautifulSoup

    #part1
    url="http://www.flipkart.com/mobiles/samsung~brand/pr?sid=tyy,4io"

    regex = '<a class="fk-display-block" data-tracking-id="prd_title" href=(.+?)title'  # it will find the title
    pattern=re.compile(regex)

    htmlfile = urllib.urlopen(url)

    htmltext= htmlfile.read()
    docSoup=BeautifulSoup(htmltext)
    abc=docSoup.findAll('a')
    c=str(abc)

    count=0
    #------part 2     it goes to each link and gathers the mobile specificattions
    title=re.findall(pattern,c)

    temp=1
    file2=open('c:/Python27/samsung.txt','w')

    for i in title:
        print i
        file2.write(i)
        file2.write("\n")
        count=count+1
        print "\n1\n"
        #print i
        if temp>0 :
            mob_url='http://www.flipkart.com'+i[1:len(i)-2]
            htmlfile = urllib.urlopen(mob_url)
            htmltext= htmlfile.read()
            # htmltext
            docSoup=BeautifulSoup(htmltext)

            abc=docSoup.find_all('td')
            file=open('c:/Python27/prut2'+str(count)+'.txt','w')
            mod=0
            count=count+1
            pr=-1
            for j in abc:
                if j.text == 'Brand':
                    pr=3

                if mod ==1:
                    file2.write((j).text)
                    file2.write("\n")
                    mod=0
                if j.text == 'Model ID':
                    mod=1
                #sprint j.text

                if pr>0 :
                    file.write(j.text)
                    file.write('\n')

                file.close
        else :
            temp=temp+1



    print count
    file2.close

ウイルス対策を無効にしてみましたが、使用しているネット接続は非常に安定していますが、それでもエラーが発生するので、修正する方法はありますか?

score 1 · Accepted Answer

開いている接続が多すぎる可能性があります。

htmlfile.close()の後に追加しhtmltext= htmlfile.read()ます。

python - Webスクレイピング用のPythonコードでしばらく作業した後のエラー

1 に答える 1

Related

Reference