python - 正規表現を使用したPythonWebスクレイピング

Question

ゲームから統計を取得するために構築したいコードを誰かが手伝ってくれませんか？HTMLをBeautifulSoupに入れることはできますが、ページ全体から特定のデータを取得するために正規表現を適切にフォーマットする方法がわかりません。これが私が持っているものです：

from urllib import urlopen
from bs4 import BeautifulSoup
import re

content = urlopen('http://www.worldoftanks.com/community/accounts/1000395103-FrankenTank').read()
soup = BeautifulSoup(content)
print soup

1つの統計を引き出す方法を教えていただければ、残りを理解できます。統計の1つは、参加した戦闘（10103）で、次のようにコード化されています。

<tr>
<td class=""> Battles Participated: </td>
<td class="td-number-nowidth"> 10 103 </td>
</tr>

ありがとう！

フランク

score 3 · Accepted Answer

ツリーを検索する：

battles = soup.find('td', 'td-number-nowidth')
if battles:
   print(battles.get_text())

score 0 · Accepted Answer

引き出したい数字の間にそのスペースが含まれていますか？もしそうなら、私は次のようなことをします：

m = re.search('class="td-number-nowidth">(\d+) (\d+)</td>', soup)
if m:
    print(m.groups())

tuplegroups（）は、「10」と「103」を含む文字列を返すため、これらを連結して、string型のままにするか、目的に合わせて解析する必要がある場合がありintます。

matched = m.groups()
num = matched[0] + matched[1]
finalnumber = int(num)

python - 正規表現を使用したPythonWebスクレイピング

2 に答える 2

Related

Reference