python - python lxml find鬼ごっこ

Question

私はlxmlを使用して、次のようなFacebookコメントタグを持つhtmlを解析しています。

<fb:comments id="fb_comments"  href="http://example.com" num_posts="5" width="600"></fb:comments>

href値を取得するために選択しようとしていますが、選択するとcssselect('fb:comments')次のエラーが発生します。

The pseudo-class Symbol(u'comments', 3) is unknown

それを行う方法はありますか？

編集： コード：

from lxml.html import fromstring
html = '...'
parser = fromstring(html)
parser.cssselect('fb:comments')  #raises the exception

score 3 · Accepted Answer

このメソッドは、指定されたCSSセレクター式cssselect()を使用してドキュメントを解析します。あなたの場合、コロン文字（）はXML名前空間プレフィックス区切り文字（ie ）であり、CSS疑似クラス構文（ie ）と混同されます。:<namespace:tagname/>tagname:pseudo-class

lxmlのマニュアルによると、名前空間プレフィックス（）を持つタグ（）を見つけるには、namespace-prefix|element構文を使用する必要があります。それで：cssselect()commentsfb

from lxml.html import fromstring
html = '...'
parser = fromstring(html)
parser.cssselect('fb|comments')

python - python lxml find鬼ごっこ

1 に答える 1

Related

Reference