python-2.7 - 必要なテーブルとデータをスクレイピングすることの何が問題になっていますか?

Question

http://www.scoresandodds.com/grid_20111225.htmlのテーブルからマイアミヒートとその対戦相手のデータを取得しようとしています。私が抱えている問題は、NBA と NFL およびその他のスポーツのテーブルがすべて同じようにマークされており、取得するすべてのデータが NFL テーブルからのものであることです。もう1つの問題は、シーズン全体のデータをスクレイピングしたいのですが、異なるテーブルの数が変化し、テーブル内のマイアミの位置が変化することです。これは、今までさまざまなテーブルに使用してきたコードです。

では、なぜこれが仕事を成し遂げていないのですか？辛抱していただきありがとうございます。私は本当の初心者で、数日間この問題を解決しようとしてきましたが、効果がありません.

def tableSnO(htmlSnO):
gameSections = soup.findAll('div', 'gameSection')
for gameSection in gameSections:
    header = gameSection.find('div', 'header')
    if header.get('id') == 'nba':
        rows = gameSections.findAll('tr')
        def parse_string(el):
            text = ''.join(el.findAll(text=True))
            return text.strip()
        for row in rows:
            data = map(parse_string, row.findAll('td'))
            return data

最近、別のアプローチを試すことにしました。ページ全体をスクレイピングして問題のデータのインデックスを取得すると (ここで停止します:)、テーブルの構造は決して変わらないため、リストから次のデータセットを取得できます。htmlSnO を取得するのと同じ方法で、対戦相手のチーム名を取得することもできます。これはとても基本的なことのように感じますが、正しく理解できないのは私を殺しています.

def tableSnO(htmlSnO):
oddslist = soupSnO.find('table', {"width" : "100%", "cellspacing" : "0", "cellpadding" : "0"})
rows = oddslist.findAll('tr',)
def parse_string(el):
    text = ''.join(el.findAll(text=True))
    return text.strip()
for row in rows:
    data = map(parse_string, row.findAll('td'))

    for teamName in data:
        if re.match("(.*)MIAMI HEAT(.*)", teamName):
            return teamName
            return data.index(teamName)

score 0 · Accepted Answer

作業コードを使用した新しい最終的な回答:

必要なページのセクションには次のものがあります。

<div class="gameSection">
    <div class="header" id="nba">

これにより、NBA テーブルに到達できるはずです。

def tableSnO(htmlSnO):
    gameSections = soup.findAll('div', 'gameSection')
    for gameSection in gameSections:
        header = gameSection.find('div', 'header')
        if header.get('id') == 'nba':
            # process this gameSection
            print gameSection.prettify()

完全な例として、テストに使用した完全なコードを次に示します。

import sys
import urllib2
from bs4 import BeautifulSoup

f = urllib2.urlopen('http://www.scoresandodds.com/grid_20111225.html')
html = f.read()
soup = BeautifulSoup(html)

gameSections = soup.findAll('div', 'gameSection')
for gameSection in gameSections:
    header = gameSection.find('div', 'header')
    if header.get('id') == 'nba':
        table = gameSection.find('table', 'data')
        print table.prettify()

これにより、NBA データテーブルが出力されます。

python-2.7 - 必要なテーブルとデータをスクレイピングすることの何が問題になっていますか?

1 に答える 1

作業コードを使用した新しい最終的な回答:

Related

Reference