xml - 構造体フィールドへのアクセス (XML パッケージ)

Question

HTMLTreeParser を使用してこの構造を取得し、ページにテキストを含める必要があります。

doc <- htmlTreeParse(url, useInternalNodes = FALSE)
doc
$file
[1] "http://www.google.com/trends/fetchComponent?q=asdf,qwerty&cid=TIMESERIES_GRAPH_0&export=3"

$version
[1] ""

$children
$children$html
<html>
<body>
<p>// Data table response google.visualization.Query.setResponse([INSERT LOT OF JSON HERE])</p>
</body>
</html>
attr(,"class")
[1] "XMLDocumentContent"

「p」ブロックにあるものを探しています。今日、私を助けることができるものは何も見つかりませんでした。
では、どうすればそれらのデータを取得できますか?

score 0 · Accepted Answer

ドキュメントで XPath を実行する場合は、設定する必要がありますuseInternalNodes = TRUE(この引数に関するドキュメントを参照してください)。次のコードは、XPath の使用を開始するためのものです。

[注: コードを実行すると、取得したドキュメントではなく、エラーページが表示されます。]

library(XML)
url <- "http://www.google.com/trends/fetchComponent?q=asdf,qwerty&cid=TIMESERIES_GRAPH_0&export=3"
doc <- htmlTreeParse(url, useInternalNodes = T)
# XPath examples
p        <- doc["//p"]        # nodelist of all the <p> elements (there aren't any...)
div      <- doc["//div"]      # nodelist of all the <div> elememts
scripts  <- doc["//script"]   # nodelist of all the <script> elements
b.script <- doc["//body/script"]    # nodelist of all <script> elements within the <body>

# title of the page
xmlValue(doc["//head/title"][[1]])
# [1] "Google Trends - An error has been detected"

基本的に、XPath 文字列は、ドキュメントへのインデックスであるかのように使用できます。だからあなたの場合、

xmlValue(doc["//p"][[1]])

<p>の（最初の）要素に含まれるテキストを返す必要がありますdoc

xml - 構造体フィールドへのアクセス (XML パッケージ)

1 に答える 1

Related

Reference