java - Java 文字列の解析と合計

Question

私は input を解析しようStringとしていますが、それを行っているときに、アルファベット以外の文字をすべて削除しながら、各単語の出現回数を確認したいと考えています。

例えば：

String str = "test man `xy KA XY test!.. KA kA TeST man poqw``e TES`T"
String s = line.replaceAll("[^\\p{L}\\p{N}\\ ]", "");
String[] werd = alphaLine.split(" ");

for(int i=0; i<werd.size(); i++) {
     if(werd[i].toLowerCase().equals("test")) {
         testcounter++;
     elseif(werd[i].toLowerCase().equals("ka")) {
         kacounter++;
     etc..

私は非常に長いStrings をチェックし、多くのターゲットStrings (この例ではkaおよびtest) に対してチェックし、このコードを 1 回のパススルーで実行できるかどうかを確認しようとしていました.replaceAll()。.split()for ループは、すべてのStrings を 3 回実行しますが、1 回実行できます。

score 0 · Accepted Answer

同じページにいるかどうかはわかりませんが、単語を検索する際の検索回数を減らす方法を尋ねているようです。検索ワードの数が非常に多い場合、これは最善のアプローチではない可能性がありますが、リストが小さい場合は各ワードの出現回数を指定する必要があります。

Map<String, Integer> occurrences = new HashMap<String, Integer>();
List<String> words = new ArrayList<String>();
words.add("foo");
words.add("bar");

//build regex - note: if this is done within an outer loop, then you should consider using StringBuilder instead
//The \b in regex is a word boundary
String regex = "\\b(";
for(int i = 0; i < words.size(); i++) {
    //add word to regex
    regex += (0 == i ? "" : "|") + words.get(i);

    //initial occurrences
    occurrences.add(words.get(i), 0);
}
regex += ")\\b";
Pattern patt = Pattern.compile(regex);
Matcher matcher = patt.matcher(search_string);

//check for matches
while (matcher.find()) {
    String key = matcher.group();
    int numOccurs = occurrences.get(key) + 1;
    occurrences.put(key, numOccurs);
}

編集：これは、この時点より前に英数字以外の要件に対応していることを前提としています

java - Java 文字列の解析と合計

1 に答える 1

Related

Reference