r - R: R での Web スクレイピング時に xpath の生成に問題がある

Question

次のWebサイトをWebスクレイピングする作業を行っています。

   http://www.crowdrise.com/waterforpeople-SE

このウェブサイトを見ると、右側のと書かれた黒いボタンのすぐ上に、次のようFundraise for this campaignな記述があります52% Raised of $20,000 Goal。私は今述べたまさにこの声明を抽出しようとしています。

私が試したxpath式のために：

  .//*[@id="thebody"]/div[6]/div/div/div[2]/div[2]/div[2]/div/p/span

しかし、うまくいきませんでした...

正しい xpath 式は何ですか?

ありがとうございました、

score 1 · Accepted Answer

これを試して：

> library(XML)
> doc <- htmlTreeParse('http://www.crowdrise.com/waterforpeople-SE', useInternalNodes = TRUE)
> xpathApply(doc, '//div[@class="grid1-4"]//p[@class="progressText"]')
[[1]]
<p class="progressText">
  <span>52% Raised of $20,000 Goal</span>
</p> 

attr(,"class")
[1] "XMLNodeSet"

または、テキスト値に直接移動します。

> xpathApply(doc, '//div[@class="grid1-4"]//p[@class="progressText"]', xmlValue)
[[1]]
[1] "52% Raised of $20,000 Goal"

r - R: R での Web スクレイピング時に xpath の生成に問題がある

1 に答える 1

Related

Reference