python - タグから値を取得する

Question

から値を読み取るにはどうすればよいですか

<th class="class_name"> Sample Text </th>

Pythonを使用して上記のHTMLコードから文字列「サンプルテキスト」を取得するのを手伝ってくれる人はいますか?

ありがとうございました。

score 5 · Accepted Answer

HTMLを解析するための私のお気に入りのライブラリであるBeautifulSoupを使用できます。

from BeautifulSoup import BeautifulSoup
html = '<th class="class_name"> Sample Text </th>'
soup = BeautifulSoup(html)
print soup.th.text

score 0 · Accepted Answer

正規表現ソリューション:

import re

th_regex = re.compile(r'<th\s+class="class_name">(.*?)</th>')
search_result = th_regex.search(input_string)

print(search_result and search_result.group(1) or 'not found')

注: 発生時に文字の取得を停止する貪欲でない検索を使用するには?、 afterを使用する必要があります。そうしないと、文字列全体がの最後まで取得されます。.*</th>input_string

score 0 · Accepted Answer

を使用minidomして解析できます。ただし、正確なニーズが何であるかはわかりません。

from xml.dom import minidom
dom = minidom.parseString(html)
for elem in dom.getElementsByTagName('th'):
    if elem.getAttribute('class') == 'class_name':
        print elem.firstChild.nodeValue

score 0 · Accepted Answer

正規表現ソリューション:

import re

s = '<th class="class_name"> Sample Text </th>'
data = re.findall('<th class="class_name">(.*?)</th>', s)
print data

python - タグから値を取得する

4 に答える 4

Related

Reference