java - 効率的なテキスト処理 Java

Question

ログファイルを処理するアプリケーションを作成しましたが、ファイルの量が ~20 の場合にボトルネックが発生します。

この問題は、大まかに完了するのに平均で 1 秒ほどかかる特定の方法に起因します。想像できるように、50 回以上実行する必要がある場合、これは実用的ではありません。

private String getIdFromLine(String line){
    String[] values = line.split("\t");
    String newLine = substringBetween(values[4], "Some String : ", "Value=");
     String[] split = newLine.split(" ");
     return split[1].substring(4, split[1].length());
}



private String substringBetween(String str, String open, String close) {
      if (str == null || open == null || close == null) {
          return null;
      }
      int start = str.indexOf(open);
      if (start != -1) {
          int end = str.indexOf(close, start + open.length());
          if (end != -1) {
              return str.substring(start + open.length(), end);
          }
      }
      return null;
  }

行は非常に効率的なファイルの読み取りから発生するため、誰かが尋ねない限り、そのコードを投稿する必要はありません。

とにかくこれのパフォーマンスを改善する方法はありますか?

御時間ありがとうございます

score 3 · Accepted Answer

いくつかの問題が考えられます。

気付いているかどうかにかかわらず、正規表現を使用しています。への引数String.split()は、正規表現として扱われます。を使用String.indexOf()すると、ほぼ確実に、必要な文字列の特定の部分をすばやく見つけることができます。HRgiger が指摘しているように、Guava のスプリッターはまさにそれを行うため、良い選択です。
必要のないものをたくさん割り当てています。行の長さによっては、必要のない大量の余分なStringやが作成される可能性がString[]あります (およびそれらを収集するガベージ)。避けるべきもう一つの理由String.split()。
また、読みやすくするためだけに、String.startsWith()andを使用することをお勧めしString.endsWith()ます。indexOf()

score 2 · Accepted Answer

2

正規表現を使用してみます。

于 2012-12-14T09:12:59.027 に答える

score 1 · Accepted Answer

このコードの主な問題の1つは、" split"メソッドです。たとえば、これは次のとおりです。

    private String getIdFromLine3(String line) {
        int t_index = -1;
        for (int i = 0; i < 3; i++) {
            t_index = line.indexOf("\t", t_index+1);
            if (t_index == -1) return null;
        }
        //String[] values = line.split("\t");
        String newLine = substringBetween(line.substring(t_index + 1), "Some String : ", "Value=");
//        String[] split = newLine.split(" ");
        int p_index = newLine.indexOf(" ");
        if (p_index == -1) return null;
        int p_index2 = newLine.indexOf(" ", p_index+1);
        if (p_index2 == -1) return null;
        String split = newLine.substring(p_index+1, p_index2);

//        return split[1].substring(4, split[1].length());
        return split.substring(4, split.length());
    }

UPD：3倍速くなる可能性があります。

score 0 · Accepted Answer

Oprimisationの前に、 VisualVMを使用してボトルネックを見つけることをお勧めします。
アプリケーションでパフォーマンスが必要な場合は、とにかくプロファイリングが必要になります。

最適化として、私はあなたのメソッドを置き換えて複数の呼び出しsubstringBetweenを取り除くためにカスタムループを作りますindexOf

score 0 · Accepted Answer

0

Googleguavaスプリッターもかなり高速です。

于 2012-12-14T09:23:02.887 に答える

score 0 · Accepted Answer

とにかく正規表現を試して、比較のために結果を投稿してください：

Pattern p = Pattern.compile("(Some String : )(.*?)(Value=)"); //remove first and last group if not needed (adjust m.group(x) to match

@Test
public void test2(){
    String str = "Long java line with Some String : and some object with Value=154345 ";
    System.out.println(substringBetween(str));      
}

private String substringBetween(String str) {       
    Matcher m = p.matcher(str);
    if(m.find(2)){
        return m.group(2);          
    }else{
        return null;
    }
}

これが速い場合は、両方の機能を組み合わせた正規表現を見つけてください

java - 効率的なテキスト処理 Java

6 に答える 6

Related

Reference