python - BeautifulSoupの使用中にhtmlタグを削除する際の問題

Question

美しいスープを使用してウェブサイトから一部のデータを取得していますが、印刷中にデータからhtmlタグを削除できません。参照されるコードは次のとおりです。

import csv
import urllib2
import sys  
from bs4 import BeautifulSoup

page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
    print anchor1
for anchor2 in soup.findAll('div', {"class": "gridPrice"}):
    print anchor2
for anchor3 in soup.findAll('div', {"class": "gridMultiDevicePrice"}):
    print anchor3

これを使用して取得している出力は、次のようになります。

<div class="listGrid-price"> 
                                $99.99 
            </div>
<div class="listGrid-price"> 
                                $0.01 
            </div>
<div class="listGrid-price"> 
                                $0.01 
            </div>

周りにhtmlタグを付けずに、出力の価格のみが必要です。私はプログラミングに不慣れなので、私の無知を許してください。

score 0 · Accepted Answer

見つかったタグを印刷しています。含まれているテキストのみを印刷するには、次の.string属性を使用します。

print anchor1.string

.string値はインスタンスNavigableStringです; 通常のUnicodeオブジェクトのように使用するには、最初に変換します。strip()次に、余分な空白を削除するために使用できます。

print unicode(anchor1.string).strip()

これを少し調整して、空の値を許可します。

for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
    if anchor1.string:
        print unicode(anchor1.string).strip()

それは私に与えます：

$99.99
$0.99
$0.99
$299.99
$199.99
$49.99
$49.99
$99.99
$0.99
$99.99
$0.01
$0.01
$0.01
$0.01
$0.01

python - BeautifulSoupの使用中にhtmlタグを削除する際の問題

1 に答える 1

Related

Reference