python - SoupStrainer を使用して選択的に解析する

Question

ショッピングサイトからビデオゲームタイトルのリストを解析しようとしています。ただし、アイテムリストはすべてタグ内に格納されるためです。

ドキュメントのこのセクションでは、ドキュメントの一部のみを解析する方法を説明していると思われますが、うまくいきません。私のコード:

from BeautifulSoup import BeautifulSoup
import urllib
import re

url = "Some Shopping Site"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
for a in soup.findAll('a',{'title':re.compile('.+') }):
    print a.string

現時点では、空でないタイトル参照を持つタグ内の文字列を出力します。しかし、それは「スペシャル」であるサイドバーのアイテムも掲載しています。商品リストのdivしかとれなかったら一石二鳥です。

どうもありがとう。

score 12 · Accepted Answer

ああ、私はばかです、私は属性 ID = 製品のタグを検索していましたが、それは product_list である必要がありました

誰かが検索に来たら、最終的なコードをここに示します。

from BeautifulSoup import BeautifulSoup, SoupStrainer
import urllib
import re


start = time.clock()
url = "http://someplace.com"
html = urllib.urlopen(url).read()
product = SoupStrainer('div',{'id': 'products_list'})
soup = BeautifulSoup(html,parseOnlyThese=product)
for a in soup.findAll('a',{'title':re.compile('.+') }):
      print a.string

score 0 · Accepted Answer

最初に製品リストを検索してから、タイトルdivのタグを検索してみてください。a

product = soup.find('div',{'id': 'products'})
for a in product.findAll('a',{'title': re.compile('.+') }):
   print a.string

python - SoupStrainer を使用して選択的に解析する

2 に答える 2

Related

Reference