python - 美しいスープのエラー

Question

このソースからタイトルタグのテキストを削除する必要があります。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html dir="ltr" lang="en">
<head>
    <title>Microsoft to acquire Nokia’s devices &amp; services business, license Nokia’s patents and mapping services</title>
    <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE9; IE=10" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta id="ctl00_WtCampaignId" name="DCSext.wt_linkid" />
    </title>

これを使用してテキストを削除しています：

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]

ourUrl = opener.open("http://www.thehindubusinessline.com/industry-and-economy/info-tech/nokia-cannot-license-brand-nokia-post-microsoft-deal/article5156470.ece").read()

soup = BeautifulSoup(ourUrl)
print soup
dem = soup.findAll('p')
hea = soup.findAll('title')

このコードは p タグを正しく抽出しますが、タイトルを抽出しようとすると失敗します。ありがとう。コードの一部のみを含めましたが、残りは正常に動作することを心配しないでください。

score 0 · Accepted Answer

HTML コードにエラーがあります。2 つの終了タグがあります</title>。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html dir="ltr" lang="en">
<head>
    <title>Microsoft to acquire Nokia’s devices &amp; services business, license Nokia’s patents and mapping services</title>
    <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE9; IE=10" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta id="ctl00_WtCampaignId" name="DCSext.wt_linkid" />
    </title> #You already have endtag of <title>

したがって、修正されたコードは次のようになります。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html dir="ltr" lang="en">
<head>
    <title>Microsoft to acquire Nokia’s devices &amp; services business, license Nokia’s patents and mapping services</title>
    <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE9; IE=10" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta id="ctl00_WtCampaignId" name="DCSext.wt_linkid" />

python - 美しいスープのエラー

1 に答える 1

Related

Reference