python - Pythonでブログデータを抽出する

Question

nブログのリストを含むテキストファイルから読み込んで、指定された数のブログ ( ) を抽出する必要があります。

次に、ブログデータを抽出し、ファイルに追加します。

nlpこれは、データに適用するという主な割り当ての一部にすぎません。

これまでのところ、私はこれを行ってきました：

import urllib2
from bs4 import BeautifulSoup
def create_data(n):
    blogs=open("blog.txt","r") #opening the file containing list of blogs

    f=file("data.txt","wt") #Create a file data.txt

    with open("blog.txt")as blogs:
        head = [blogs.next() for x in xrange(n)]
        page = urllib2.urlopen(head['href'])

        soup = BeautifulSoup(page)
        link = soup.find('link', type='application/rss+xml')
        print link['href']

        rss = urllib2.urlopen(link['href']).read()
        souprss = BeautifulSoup(rss)
        description_tag = souprss.find('description')

        f = open("data.txt","a") #data file created for applying nlp
        f.write(description_tag)

このコードは機能しません。リンクを直接提供することに取り組みました。

page = urllib2.urlopen("http://www.frugalrules.com")

ユーザーが入力を与える別のスクリプトからこの関数を呼び出しますn。

私は何を間違っていますか？

トレースバック:

    Traceback (most recent call last):
  File "C:/beautifulsoup4-4.3.2/main.py", line 4, in <module>
    create_data(2)#calls create_data(n) function from create_data
  File "C:/beautifulsoup4-4.3.2\create_data.py", line 14, in create_data
    page=urllib2.urlopen(head)
  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 395, in open
    req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'

python - Pythonでブログデータを抽出する

1 に答える 1

Related

Reference