0

これはコードのほんの一部です。div と a href の値を置き換えていません。美汁級タグです

soup = BeautifulSoup(ourUrl)
dem = soup.findAll('p')
for i in range(0,len(dem)-1):
              dk = dem[i]


              if ('<div') in dk:
                   print "it here"
                   dk =dk.replace('<div','<!--')
                   dk =dk.replace('</div>','--->')
                   dem[i] = dk
for i in range(0,len(dem)-1):
              dk = dem[i]
              if ('<a href') in dk:
                   print "it here"
                   dk =dk.replace('<a href','<!--')
                   dk =dk.replace('</a>','--->')
                   dem[i] = dk

dem 値は次のようなものです。

dem =[    <p class="left-text padding-left-10">
<a href="/people" class="red-text">See all people</a>
</p>
<p class="left-text padding-left-10">
<a href="/tv" class="red-text" style="display:inline;">See all bio TV</a>
<span class="divider">&nbsp;|&nbsp;&nbsp;</span>
<a href="/tv/daily-schedule" class="red-text" style="display:inline;">See schedule </a>
</p>
<p class="left-text bottom-flyout-video-padding">
<a href="/videos" class="red-text ">See all videos</a>
</p>
<p class="left-text padding-left-10">
<a href="http://shop.history.com/?v=biography" class="red-text">Shop now</a>
</p>
<p>TV14 </p>
<p>He rose from the slums of Brooklyn to take on the biggest Mafia dons of the 1950s and 1960s. Joey Gallo began his criminal career as a small-time loan shark and jukebox racketeer. He became a top enforcer in the Profaci crime family, but felt he never got the respect he deserved. So Gallo formed his own gang and revolted against mafia Don Joe Profaci in a long, bloody war on the streets of New York. But there was another side to Joey Gallo--the ruthless mob leader was also an artist and an avid reader. Living in Greenwich Village with his wife Jeffie, Gallo was inspired by his beatnik neighbors and their counterculture ideas. He also began hobnobbing with New York's social elite, befriending everyone from Neil Simon to Jerry Orbach. In the end though, nothing could save Joey Gallo from a dramatic end.</p>
<p>TV14 </p>
<p>
<p> Charles Darwin, <a href="/people/charles-darwin-9266433">http://www.biography.com/people/charles-darwin-9266433</a> (last visited Aug 27, 2013).</p>
<p> Charles Darwin. The Biography Channel website. 2013. Available at: <a href="/people/charles-darwin-9266433">http://www.biography.com/people/charles-darwin-9266433</a>. Accessed Aug 27, 2013. </p>
<p>Naturalist Charles Darwin was born in Shrewsbury, England, on February 12, 1809. In 1831, he embarked on a five-year survey voyage around the world on the HMS <i>Beagle</i>. His studies of specimens around the globe led him to formulate his theory of evolution and his views on the process of natural selection. In 1859, he published <i>On the Origin of Species</i>. He died on April 19, 1882, in London.</p>
<p><span class="body">A man who dares to waste one hour of time has not discovered the value of life.</span></p>


                            571 people in this group<br />
</p>]

dem の値が大きすぎて入力できないため、抽出したものを提供しました。あるのに

4

1 に答える 1

0

置き換えたタグを含むコメントで要素を置き換えたい場合は、オブジェクトを新しいbs4.Comment()オブジェクトに置き換えます。

from bs4 import Comment

for para in soup.find_all('p'):
    for div in para.find_all('div'):
        div.replace_with(Comment(unicode(div)))
    for link in para.find_all('a', href=True):
        link.replace_with(Comment(unicode(link)))

forPython では、 でループを使用する代わりにrange()、シーケンスを直接ループします。上記のコードでは、結果を直接ループしてい.find_all()ます。

BeautifulSoup 要素は単なる HTML テキストのように出力される場合がありますが、実際には文字列ではなくTag()オブジェクト です。それらを文字列として扱わないでください。

于 2013-08-27T11:52:09.923 に答える