python - Selector で子要素を選択する方法

Question

HTMLXPathSelector を使用して HTML コンテンツを解析しています。また、ターゲット Web サイトにはランダムな HTML タグがあります。例: その形式は次のとおりです。

<div class="doctor_ans">
  <h3>Title</h3>
  <p style="text-align: justify;">
    <span style="font-size: 12px;">
      <span style="font-family: arial,helvetica,sans-serif;">
        <font color="#000000">I would like to get contain here.</font>
      </span>
    </span>
  </p>    
</div>

また

<div class="doctor_ans">
  <h3>Title</h3>
  <p style="text-align: justify;">
    <span style="font-size: 12px;">
      <span style="font-family: arial,helvetica,sans-serif;">
        I would like to get contain here.>
      </span>
    </span>
  </p>    
</div>

また

<div class="doctor_ans">
  <h3>Title</h3>
  <p>
    <span style="font-size: 12px;">
      <span style="font-family: arial,helvetica,sans-serif;">
        <font color="#000000">I would like to get contain here.</font>
      </span>
    </span>
  </p>    
</div>

また

<div class="doctor_ans">
  <h3>Title</h3>
  <p>
    <span style="font-size: 12px;">
        I would like to get contain here.
    </span>
  </p>    
</div>

等々。
このコンテンツを解析する方法についてアドバイスをください。HTML タグはランダムに発生します。したがって、最後の要素を見つけるために子要素を取得するメソッドが必要です。

score 1 · Accepted Answer

hxs = HtmlXPathSelector(response)
hxs.select('div[@class="doctor_ans"]/p[1]//text()').extract()

doctor_ansdivの最初の段落にある個々のテキストのリストが表示されます。

score 0 · Accepted Answer

Selenium の使用経験は豊富ですが、xpath 部分は同じはずです。xpath='.//span' を使用して子要素を選択し、その要素の .text を取得します。子要素が空の場合は破棄し、次の要素に進みます。

python - Selector で子要素を選択する方法

2 に答える 2

Related

Reference