groovy - Groovy htmlユニット

Question

htmlunit (htmlunit.sf.net) を groovy スクリプトにインポートする際に問題が発生しています。

現在、Web 上にあるサンプルスクリプトを使用しているだけで、クラス com.gargoylesoftware.htmlunit.WebClient を解決できません。

スクリプトは次のとおりです。

import com.gargoylesoftware.htmlunit.WebClient

client = new WebClient()
html = client.getPage('http://www.msnbc.msn.com/')
println page.anchors.collect{ it.hrefAttribute }.sort().unique().join('\n')

Web サイトからソースをダウンロードし、com フォルダー (およびそのすべてのコンテンツ) をスクリプトのある場所に配置しました。

私が直面している問題を知っている人はいますか? なぜインポートしないのかよくわかりません

score 3 · Accepted Answer

Grape を使用して、スクリプトの実行時に依存関係を取得できます。これを行う最も簡単な方法は、@Grab アノテーションを import ステートメントに追加することです。

このような：

@Grab('net.sourceforge.htmlunit:htmlunit:2.7')
import com.gargoylesoftware.htmlunit.WebClient

client = new WebClient()

// Added as HtmlUnit had problems with the JavaScript
client.javaScriptEnabled = false
html = client.getPage('http://www.msnbc.msn.com/')
println page.anchors.collect{ it.hrefAttribute }.sort().unique().join('\n')

1つだけ問題があります。このページは、HtmlUnit を噛み砕くには少し多すぎるようです。コードを実行すると、毎回 OutOfMemoryException が発生しました。代わりに通常の方法で html をダウンロードし、NekoHtml や TagSoup などを使用して html を XML に解析し、その方法で作業することをお勧めします。

この例では、TagSoup を使用して、Groovy で html を xml として処理します: http://blog.foosion.org/2008/06/09/parse-html-the-groovy-way/

score 1 · Accepted Answer

zipファイルをダウンロードし、jarファイルを抽出して、コンパイル時にクラスパスに配置するだけです...ソースは必要ありません

http://sourceforge.net/projects/htmlunit/files/htmlunit/2.8/htmlunit-2.8.zip/download

groovy - Groovy htmlユニット

2 に答える 2

Related

Reference