java - XML ファイルから単語数を抽出する

Question

（この質問は、以前にstackoverflowに投稿した以前の質問に関連しています...ここにリンクがあります

この特定のシナリオでは、XPath、SAX、または DOM を使用して XML ファイルから値を抽出する)

問題は、上記の場合を念頭に置いて、文章を取得するのではなく、各参加者が書いた単語をすべての文章で取得したい場合です。例えば。「Budget」という単語が合計 10 回使用され、参加者「Dolske」が 7 回、他の人が 3 回使用した場合。それでは、すべての単語のリストが必要で、各参加者が何回書いたのでしょうか? また、各ターンの単語のリストは？

これを達成するための最良の戦略は何ですか？サンプルコードはありますか?

XML はここに添付されています (参照されている質問でも確認できます)。

「(495584) Firefox - 検索候補が間違った前の結果をフォーム履歴に渡す」

<Turn>
  <Date>'2009-06-14 18:55:25'</Date>
  <From>'Justin Dolske'</From>
  <Text>
    <Sentence ID = "3.1"> Created an attachment (id=383211) [details] Patch v.2</Sentence>
    <Sentence ID = "3.2"> Ah. So, there's a ._formHistoryResult in the....</Sentence>
    <Sentence ID = "3.3"> The simple fix it to just discard the service's form history result.</Sentence>
    <Sentence ID = "3.4"> Otherwise it's trying to use a old form history result that no longer applies for the search string.</Sentence>
  </Text>
</Turn>

<Turn>
  <Date>'2009-06-19 12:07:34'</Date>
  <From>'Gavin Sharp'</From>
  <Text>
    <Sentence ID = "4.1"> (From update of attachment 383211 [details])</Sentence>
    <Sentence ID = "4.2"> Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
  </Text>
</Turn>

<Turn>
  <Date>'2009-06-19 13:17:56'</Date>
  <From>'Justin Dolske'</From>
  <Text>
    <Sentence ID = "5.1"> (In reply to comment #3)</Sentence>
    <Sentence ID = "5.2"> &amp;gt; (From update of attachment 383211 [details] [details])</Sentence> 
    <Sentence ID = "5.3"> &amp;gt; Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
    <Sentence ID = "5.4"> Good point.</Sentence>
    <Sentence ID = "5.5"> I renamed the one in the wrapper to _formHistResult. </Sentence>
    <Sentence ID = "5.6"> fhResult seemed maybe a bit too short.</Sentence>
  </Text>
</Turn>

..... 等々

ヘルプは非常に高く評価されます...

score 1 · Accepted Answer

すべての値を取得し、sTax パーサーをより適切に使用してください。このような種類のタスクに適しています。次に、すべての文を単語に分割し、好きなことをしてください。著者と文章を保存する Class Turn でモデルを作成するように、このクラスのサービスを作成して続行します。:)

文を単語で分割するには、split() または StringTokenizer を使用しますが、トークナイザーは非推奨です。分割を使用するには、次のような一時配列を作成します

stringArray = sentence.toString().split(" ");

または「sentence.getValue()」のように、何でも。

メソッドパラメータのどこにregExを入れますか。あなたの場合、それは単純なスペースです。文を分割するためです。次に、単語を調べて、必要なものを数えることができます。

ArrayList の場合、 List.toArray() を使用して配列ビューでリストを取得します。

java - XML ファイルから単語数を抽出する

1 に答える 1

Related

Reference