r - クリックを伴う Webscraping の問題 (R を使用)

Question

次の Web サイトを Web スクレイピングしようとしています。

http://www.healthgrades.com/hospital-directory/california-ca-san-mateo/related-physicians-HGSTED418D46050070

Rを使用してWebサイトをWebスクレイピングしています。特に、このウェブサイトから医師の名前と専門分野をすべてコピーしようとしています。ただし、私が扱っている主な問題は、矢印/次へボタンを押しても URL リンクが変わらないことです。このページをウェブスクレイピングするための基本的なテクニックを使用することはできません。どうすればこの問題を解決できますか? 収集しているすべてのデータを 1 つのデータマトリックス/スプレッドシートにまとめるとよいでしょう。

score 3 · Accepted Answer

dum <- "http://www.healthgrades.com/hospital-directory/california-ca-san-mateo/affiliated-physicians-HGSTED418D46050070"
library(XML)
ddum <- htmlParse(dum)
noofpages <- xpathSApply(ddum,'//*/span[@class="paginationItem active"]/following-sibling::*[1]',xmlValue)[1]
noofpages <- (as.numeric(gsub(' of ','',noofpages))-1)%/%5+1
doctors <- c(); dspec <- c()
for(i in 1:noofpages){
 if(i>1){
  ddum <- htmlParse(paste0(dum,"?pagenumber=",i,'#'))
 }
 doctors <- c(doctors, xpathSApply(ddum,'//*/a[@class="providerSearchResultSelectAction"]',xmlValue))
 dspec <- c(dspec, xpathSApply(ddum,'//*/div[@class="listingHeaderLeftColumn"]/p',xmlValue))
}

paste(doctors,dspec,sep=',')
#  [1] "Dr. Julia Adamian, MD,Internal Medicine"                               
#  [2] "Dr. Eric R. Adler, MD,Internal Medicine"                               
#  [3] "Dr. Ramzi S. Alami, MD,General Surgery"                                
#  [4] "Dr. Jason L. Anderson, MD,Internal Medicine"                           
#  [5] "Dr. Karl A. Anderson, MD,Urology"                                      
#  [6] "Dr. Christine E. Angeles, MD,Geriatric Medicine, Pulmonology"

score 2 · Accepted Answer

変数を使用しているようです

?pagenumber=x

おそらく反復しxてデータを取得できます。

ちなみに、

お使いのブラウザはわかりませんが、Chrome にはボタンを右クリックしてを選択できる便利な機能がありますinspect element。

r - クリックを伴う Webscraping の問題 (R を使用)

2 に答える 2

Related

Reference