html - XPath-テキストを照合することで識別される2つのDIVの間にあるtext（）を選択します

Question

私はこのHTMLを持っています、

<div id="General" class="detailOn">
    <div class="tabconstraint"></div>
    <div id="InstitutionMain" class="detailseparate">
        <div id="InstitutionMain_divINFORight" style="float:right;width:40%"></div>
        <div style="font-weight:bold;padding-top:6px">Special Learning Opportunities</div>
        Distance learning opportunities<br>

        <div style="font-weight:bold;padding-top:6px">Student Services</div>
        Remedial services<br>
        Academic/career counseling service<br>

        <div style="font-weight:bold;padding-top:6px">Credit Accepted</div>
        Dual credit<br>
        Credit for life experiences<br>
    </div>
</div>

抽出したい

text() = between [Div/text() = "Special Learning Opportunities</div>
        Distance learning opportunities"] and [div/text()="Student Services"]

他のdivについても同様です

識別されたdivに続くすべてのテキストを取得するこのコードを試しました。

div[1]/div[contains(text(),"Special Learning Opportunities")]/following-sibling::text()

このコードはdivの前のすべてのテキストを私に与えますが

div[1]/div[contains(text(),"Student Services")]/preceding-sibling::text()

指定されたDIVの間にあるすべてのテキストを正確に取得する方法はありますか？前もって感謝します。

クロールにpython2.xとscrapyを使用しています。

注：私の現在の方法：-これらの3つのxpathを使用する

item['SLO']=site.select('div[1]/div[contains(text(),"Special Learning Opportunities")]/following-sibling::text()').extract()
item['SS']=site.select('div[1]/div[contains(text(),"Student Services")]/following-sibling::text()').extract()
item['CA']=site.select('div[1]/div[contains(text(),"Credit Accepted")]/following-sibling::text()').extract()

私はこのような3つのアイテムを手に入れます、

item['SLO']=['Distance learning opportunities','Remedial services',' Academic/career counseling service','Dual credit','Credit for life experiences']
item['SS']=['Remedial services',' Academic/career counseling service','Dual credit','Credit for life experiences']
item['CA']=['Dual credit','Credit for life experiences']

それから私はPythonリストに取り組んで欲しいものを手に入れます、

しかし、XPathにはそうするためのより迅速な方法があるはずだと思います。

score 4 · Accepted Answer

「aとbの間のテキスト」を「text（）[previous-sibling = a and next-sibling=b]」としてXPathに直接変換できます。

すなわち：

//text()[(preceding-sibling::div[1]/text() = "Special Learning Opportunities") and (following-sibling::div[1]/text() = "Student Services")]

動作するはずです。

（テストしたところ失敗しましたが、XPathインタープリターのバグのようです）

score 2 · Accepted Answer

さあ、前の答えほど上品ではありませんが、少なくともそれは機能します！:-)

div[1]//div[contains(text(),"Special Learning Opportunities")]/following-sibling::node()[position() <= count( div[1]//div[contains(text(),"Student Services")]/following-sibling::node()) + 1]

score 1 · Accepted Answer

これを試してみてください。

//div[contains(text(),"Special Learning Opportunities")]//following-sibling::text()[./following-sibling::div[contains(text(),'Student Services')]]

html - XPath-テキストを照合することで識別される2つのDIVの間にあるtext（）を選択します

3 に答える 3

Related

Reference