java - jericho htmlパーサーを使用して、ページから指定されたテキストを解析します

Question

指定したテキストをページから取得するときに問題が発生しています。私が使用している例は、Patent Assignee Summaryです。

サイトにアクセスすると、「Total: 82」と表示されています (これは基準 SASA のヒット数です)。この番号を取得する必要があります。jericho htmlパーサーを使用していますが、それを行うための関数が見つかりません。

誰かがこれについて私を助けることができますか? 私は本当にページでこの番号を取得する必要があります.

よろしくお願いします-Sasa

score 0 · Accepted Answer

Jsoupに切り替えることができる場合:

/* Connect to URL and parse it into a 'Document' */
Document doc = Jsoup.connect("http://assignments.uspto.gov/assignments/q?db=pat&qt=asne&reel=&frame=&pat=&pub=&asnr=&asnri=&asne=sasa&asnei=&asns=").get();

/* Select the required tag and print the value */
System.out.println(doc.select("p.t2").first().text());

終わり！

出力：

合計: 83 (ウェブサイトで値が変更されました)

セレクターは次のように説明しました。

doc.select("p.t2") // Select each 'p'-tag with 't2' attribute from document
   .first() // Get the first one (there are two on the website, but the first one is the required one)
   .text() // Get the text of this element

ドキュメンテーション：

java - jericho htmlパーサーを使用して、ページから指定されたテキストを解析します

1 に答える 1

Related

Reference