r - R で複数の HTML テーブルを読み取る方法

Question

この readHTML 関数のデータフレームへの取り込みと保存を自動化しようとしています。私は R 初心者で、この関数を 1 つずつ実行すると機能するループを自動化する方法を理解するのに苦労しています。

library('XML')

urls<-c("http://www.basketball-reference.com/teams/ATL/","http://www.basketball-reference.com/teams/BOS/")
theurl<-urls[2] #Pick second link (celtics)

tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
BOS <-tables[[which.max(n.rows)]] 
Team.History<-write.csv(BOS,"Bos.csv")

ありとあらゆる助けをいただければ幸いです。

score 2 · Accepted Answer

これは両方の答えの長所を組み合わせたものだと思います（そして少し片付けます）。

library(RCurl)
library(XML)

stem <- "http://www.basketball-reference.com/teams/"
teams <- htmlParse(getURL(stem), asText=T)
teams <- xpathSApply(teams,"//*/a[contains(@href,'/teams/')]", xmlAttrs)[-1]
teams <- gsub("/teams/(.*)/", "\\1", teams)
urls <- paste0(stem, teams)

names(teams) <- NULL   # get rid of the "href" labels
names(urls) <- teams

results <- data.frame()
for(team in teams){
   tables <- readHTMLTable(urls[team])
   n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
   team.results <- tables[[which.max(n.rows)]] 
   write.csv(team.results, file=paste0(team, ".csv"))
   team.results$TeamCode <- team
   results <- rbind(results, team.results)
   rm(team.results, n.rows, tables)
}
rm(stem, team)

write.csv(results, file="AllTeams.csv")

score 1 · Accepted Answer

あなたのurlsベクトルをループしたいと思っていますか？私はこのようなことを試してみます:

library('XML')

url_base <- "http://www.basketball-reference.com/teams/"
teams <- c("ATL", "BOS")

# better still, get the full list of teams as in
# http://stackoverflow.com/a/11804014/1543437

results <- data.frame()
for(team in teams){
   theurl <- paste(url_base, team , sep="/")
   tables <- readHTMLTable(theurl)
   n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
   team.results <-tables[[which.max(n.rows)]] 
   write.csv(team.results, file=paste0(team, ".csv"))
   team.results$TeamCode <- team
   results <- rbind(results, team.results)
}
write.csv(results, file="AllTeams.csv")

r - R で複数の HTML テーブルを読み取る方法

2 に答える 2

Related

Reference