python - つかむ tag with lxml's iterparse</h1> <div id="body"><p>I'm running into a problem with using lxml's <code>iterparse</code> on my HTML. I'm trying to get the <code><title></code>'s text but this simple </a></h1> <div class="ml12 aside-cta flex--item print:d-none sm:ml0 sm:mb12 sm:order-first sm:as-end"> <a href="https://stackoverflow.com/questions/ask" target="_blank" class="ws-nowrap s-btn s-btn__primary">質問する</a></div> </div> <div class="d-flex fw-wrap pb8 mb16 bb bc-black-075"> <div class="flex--item ws-nowrap mr16 mb8"> <span class="fc-light mr2"></span> </div> <div class="flex--item ws-nowrap mr16 mb8" title="2022-04-17 15:46:40Z"> <span class="fc-light mr2">質問する</span> <time itemprop="dateCreated" datetime="2012-04-24T01:16:58.927">2012-04-24T01:16:58.927</time> </div> <div class="flex--item ws-nowrap mb8" title="Viewed 6 times"> <span class="fc-light mr2"></span> 1235 次 </div> </div> <div id="mainbar" role="main" aria-label="question and answers"> <div class="question" data-questionid="4" data-position-on-page="0" data-score="763" id="question"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="4"> <button class="js-vote-up-btn flex--item s-btn s-btn__unset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Up vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-peeufs8c"> <svg aria-hidden="true" class="svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 25h32L18 9 2 25Z"></path></svg> </button> <div id="--stacks-s-tooltip-peeufs8c" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">This question shows research effort; it is useful and clear<div class="s-popover--arrow"></div></div> <div class="js-vote-count flex--item d-flex fd-column ai-center fc-black-500 fs-title" itemprop="upvoteCount" data-value=""> 3 </div> <button class="js-vote-down-btn flex--item s-btn s-btn__unset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Down vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-04106eqn"> <svg aria-hidden="true" class="svg-icon iconArrowDownLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 11h32L18 27 2 11Z"></path></svg> </button><div id="--stacks-s-tooltip-04106eqn" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">This question does not show any research effort; it is unclear or not useful<div class="s-popover--arrow"></div></div> <div id="--stacks-s-tooltip-tgvwendx" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">Bookmark this question.<div class="s-popover--arrow"></div></div> <a class="js-post-issue flex--item s-btn s-btn__unset c-pointer py6 mx-auto" data-shortcut="T" data-ks-title="timeline" data-controller="s-tooltip" data-s-tooltip-placement="right" aria-label="Timeline" aria-describedby="--stacks-s-tooltip-abwmy15k"><svg aria-hidden="true" class="mln2 mr0 svg-icon iconHistory" width="19" height="18" viewBox="0 0 19 18"><path d="M3 9a8 8 0 1 1 3.73 6.77L8.2 14.3A6 6 0 1 0 5 9l3.01-.01-4 4-4-4h3L3 9Zm7-4h1.01L11 9.36l3.22 2.1-.6.93L10 10V5Z"></path></svg></a><div id="--stacks-s-tooltip-abwmy15k" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">Show activity on this post.<div class="s-popover--arrow"></div></div> </div> </div> <div class="postcell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"> </div> <div class="mt24 mb12"> <div class="post-taglist d-flex gs4 gsy fd-column"> <div class="d-flex ps-relative fw-wrap"> <a href="/tags/python" class="post-tag js-gps-track" title="show questions tagged 'python'" rel="tag">python</a><a href="/tags/dom" class="post-tag js-gps-track" title="show questions tagged 'dom'" rel="tag">dom</a><a href="/tags/web-scraping" class="post-tag js-gps-track" title="show questions tagged 'web-scraping'" rel="tag">web-scraping</a><a href="/tags/lxml" class="post-tag js-gps-track" title="show questions tagged 'lxml'" rel="tag">lxml</a><a href="/tags/iterparse" class="post-tag js-gps-track" title="show questions tagged 'iterparse'" rel="tag">iterparse</a> </div> </div> </div> </div> <span class="d-none" itemprop="commentCount">4</span> </div> </div> <div class="js-zone-container zone-container-responsive"> <div id="dfp-isb" class="everyonelovesstackoverflow everyoneloves__inline-sidebar mx-auto" style="min-height: auto; height: auto; display: none;"></div> <div class="js-report-ad-button-container mx-auto" style="width: 300px"></div> </div> <div id="answers"> <a name="tab-top"></a> <div id="answers-header"> <div class="answers-subheader d-flex ai-center mb8"> <div class="flex--item fl1"> <h2 class="mb0" data-answercount=""> 1 に答える <span style="display:none;" itemprop="answerCount">1</span> </h2> </div> </div> </div> <a name="7"></a> <div id="answer-7" class="answer js-answer accepted-answer" data-answerid="7" data-parentid="4" data-score="506" data-position-on-page="1" data-highest-scored="1" data-question-has-accepted-highest-score="1" itemprop="suggestedAnswer" itemscope="" itemtype="https://schema.org/Answer"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="7"> <button class="js-vote-up-btn flex--item s-btn s-btn__unset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Up vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-dgvag2l3"> <svg aria-hidden="true" class="svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 25h32L18 9 2 25Z"></path></svg> </button><div id="--stacks-s-tooltip-dgvag2l3" class="s-popover s-popover__tooltip pe-none" aria-hidden="true" role="tooltip">This answer is useful<div class="s-popover--arrow"></div></div> <div class="js-vote-count flex--item d-flex fd-column ai-center fc-black-500 fs-title" itemprop="upvoteCount" data-value="2"> 2 </div> <button class="js-vote-down-btn flex--item s-btn s-btn__unset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Down vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-gn8ppsfv"> <svg aria-hidden="true" class="svg-icon iconArrowDownLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 11h32L18 27 2 11Z"></path></svg> </button> </div> </div> <div class="answercell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"> <p>実際に解析しようとしているデータの少なくとも一部を投稿したい場合があります。その情報がないので、推測です。要素が既定の XML 名前空間を定義している場合は、 <code><html></code>要素を探すときにそれを使用する必要があります。たとえば、次の簡単なドキュメントを見てください。</p> <pre><code><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd" xml:lang="en"> <head> <title>Document Title</title> </head> <body> </body> </html> </code></pre> <p>この入力が与えられた場合、以下は結果を返しません。</p> <pre><code>>>> doc = etree.parse(open('foo.html')) >>> doc.xpath('//title') [] </code></pre> <p>名前空間を指定せずに要素を探しているため、これは失敗し<code><title></code>ます...そして名前空間がないと、パーサーは一致を見つけられません (とが XML 名前空間として定義されていると仮定すると、 と<code>foo:title</code>は異なる ため)。<code>bar:title</code><code>foo:</code><code>bar:</code></p> <p>次のように、ElementTree インターフェイスで名前空間を明示的に使用できます。</p> <pre><code>>>> doc.xpath('//html:title', ... namespaces={'html': 'http://www.w3.org/1999/xhtml'}) [<Element {http://www.w3.org/1999/xhtml}title at 0x1087910>] </code></pre> <p>そして、私たちの試合があります。</p> <p>名前空間プレフィックスを<code>tag</code>iterparse の引数に渡すこともできます:</p> <pre><code>>>> titleIter = etree.iterparse(StringIO(str), ... tag='{http://www.w3.org/1999/xhtml}title') >>> list(titleIter) [(u'end', <Element {http://www.w3.org/1999/xhtml}title at 0x7fddb7c4b8c0>)] </code></pre> <p>これで問題が解決しない場合は、サンプル入力を投稿してください。そこから作業を進めます。</p> </div> <div class="mt24"> <div class="user-action-time" style="color:#999;text-align:right;">于 2012-04-24T01:40:07.603 に答える</div> </div> </div> </div> </div></div> </div> <div id="sidebar" class="show-votes" role="complementary" aria-label="sidebar"> <div class="module sidebar-related"> <h4 id="h-related">Related</h4> <div class="related js-gps-related-questions" data-tracker="rq=1"> <div class="spacer"> <a href="/questions/13023749" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/13023749" class="question-hyperlink">ruby-on-rails - CopyCopter の結果をローカルに保存する</a> </div><div class="spacer"> <a href="/questions/13023750" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/13023750" class="question-hyperlink">java - Javaでの文字列比較</a> </div><div class="spacer"> <a href="/questions/13023751" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/13023751" class="question-hyperlink">javascript - Backbone.jscollection.where配列には文字列が含まれます</a> </div><div class="spacer"> <a href="/questions/13023752" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/13023752" class="question-hyperlink">objective-c - Using NSMutableURLRequest and how to manage response/failure</a> </div><div class="spacer"> <a href="/questions/13023753" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">2</div> </a> <a href="/questions/13023753" class="question-hyperlink">ios6 - UICollectionView チュートリアルのリクエスト</a> </div><div class="spacer"> <a href="/questions/13023754" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">4</div> </a> <a href="/questions/13023754" class="question-hyperlink">java - DAO はシステムのスケーラビリティにどのように役立ちますか?</a> </div><div class="spacer"> <a href="/questions/13023755" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/13023755" class="question-hyperlink">cordova - OnRender / onShow / appendHtml 関数がバックボーン マリオネット コンポジット ビューで起動しない</a> </div><div class="spacer"> <a href="/questions/13023756" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">3</div> </a> <a href="/questions/13023756" class="question-hyperlink">javascript - 未来の日付から時間、分、秒を使用してタイマー カウントダウンを作成する</a> </div><div class="spacer"> <a href="/questions/13023757" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">2</div> </a> <a href="/questions/13023757" class="question-hyperlink">css - サブナビゲーションバーでカバーされたTwitterBootstrap2トップナビゲーションドロップダウン</a> </div><div class="spacer"> <a href="/questions/13023767" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/13023767" class="question-hyperlink">php - IMG_FILTER_GAUSSIAN_BLUR を使用すると、親指がぼやけすぎます</a> </div> </div> </div> <div class="module js-gps-related-tags" id="related-tags"> <h4 id="h-related-tags">Reference</h4> <div data-name="javascript"> <a href="https://php.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">php</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">1429865</span> </span> </div> <div data-name="javascript"> <a href="https://c-cpp.com" class="post-tag no-tag-menu js-gps-track" target="_blank">c/c++</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">756500</span> </span> </div> <div data-name="javascript"> <a href="https://nginx.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">nginx</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">49975</span> </span> </div> <div data-name="javascript"> <a href="https://mongodb.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">mongodb</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">159057</span> </span> </div> <div data-name="javascript"> <a href="https://mybatis.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">mybatis</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">3233</span> </span> </div> <div data-name="javascript"> <a href="https://anaconda.org.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">anaconda</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">13410</span> </span> </div> <div data-name="javascript"> <a href="https://pycharm.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">pycharm</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">14671</span> </span> </div> <div data-name="javascript"> <a href="https://python.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">python</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">1902243</span> </span> </div> <div data-name="javascript"> <a href="https://vscode.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">vscode</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">56040</span> </span> </div> <div data-name="javascript"> <a href="https://dockerdocs.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">docker</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">110988</span> </span> </div> <div data-name="javascript"> <a href="https://github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">github</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">49000</span> </span> </div> <div data-name="javascript"> <a href="https://flask.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">flask</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">49129</span> </span> </div> <div data-name="javascript"> <a href="https://ffmpeg.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">ffmpeg</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">24037</span> </span> </div> <div data-name="javascript"> <a href="https://jmeter.net" class="post-tag no-tag-menu js-gps-track" target="_blank">jmeter</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">16910</span> </span> </div> <div data-name="javascript"> <a href="https://matplotlib.net" class="post-tag no-tag-menu js-gps-track" target="_blank">matplotlib</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">63493</span> </span> </div> <div data-name="javascript"> <a href="https://getbootstrap.net" class="post-tag no-tag-menu js-gps-track" target="_blank">bootstrap</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">54641</span> </span> </div> </div> </div> </div> </div> </div> </div> <footer id="footer" class="site-footer js-footer" role="contentinfo"> <div class="site-footer--container"> <div class="site-footer--logo"> <a href="https://stackoverflow.com"><svg aria-hidden="true" class="native svg-icon iconLogoGlyphMd" width="32" height="37" viewBox="0 0 32 37"><path d="M26 33v-9h4v13H0V24h4v9h22Z" fill="#BCBBBB"/><path d="m21.5 0-2.7 2 9.9 13.3 2.7-2L21.5 0ZM26 18.4 13.3 7.8l2.1-2.5 12.7 10.6-2.1 2.5ZM9.1 15.2l15 7 1.4-3-15-7-1.4 3Zm14 10.79.68-2.95-16.1-3.35L7 23l16.1 2.99ZM23 30H7v-3h16v3Z" fill="#F48024"/></svg></a> </div> <nav class="site-footer--nav"> <div class="site-footer--col"> <h5 class="-title"><a href="https://stackoverflow.jp" class="js-gps-track" data-gps-track="footer.click({ location: 3, link: 15})">Stack Overflow 日本語サイト</a></h5> <p>CC BY-SA知識共有ライセンス契約に従う。</p> </div> </nav> </div> </footer> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?709ff2ad9744e86b5b0eee677fc13ede"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> <!-- Google tag (gtag.js) --> <script async src="https://www.googletagmanager.com/gtag/js?id=G-1MW5BV8G8E"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-1MW5BV8G8E'); </script> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-6117966252207595" crossorigin="anonymous"></script> </body> </html>