java - Java Regex は一致に改行を含めています

Question

Web サイトから取得した教科書の定義に正規表現を一致させようとしています。定義には常に、新しい行の後に定義が続く単語があります。例えば：

Zither
 Definition: An instrument of music used in Austria and Germany It has from thirty to forty wires strung across a shallow sounding board which lies horizontally on a table before the performer who uses both hands in playing on it Not to be confounded with the old lute shaped cittern or cithern

単語 (この場合は "Zither") だけを取得しようとすると、改行文字を取得し続けます。

私は両方を試しましたが、あまり運が^(\w+)\sありませんでした。たぶんうまくいく^(\S+)\sと思いましたが、それは言葉とまったく一致していないようです。http://rubular.com/r/LPEHCnS0ri ; ^(\S+)$rubular でテストしてきました。Javaがそうではないという事実にもかかわらず、これは私のすべての試みを私が望むようにうまく一致させているようです。

ここに私のスニペットがあります

String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\S+)$");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
    String result = mtch.group();
    terms.add(new SearchTerm(result, System.nanoTime()));
}

これは、結果の文字列をトリミングすることで簡単に解決できますが、既に正規表現を使用している場合は不要なようです。

すべてのヘルプは大歓迎です。前もって感謝します！

score 9 · Accepted Answer

Pattern.MULTILINE オプションを使用してみてください

Pattern rgx = Pattern.compile("^(\\S+)$", Pattern.MULTILINE);

これにより、正規表現は文字列内の行区切り文字を認識します。それ以外の場合は、文字列の最初^と$最後に一致するだけです。

このパターンに違いはありませんが、Matcher.group()メソッドは一致全体を返しますが、メソッドは指定した数に基づいてMatcher.group(int)特定のキャプチャグループの一致を返します。(...)パターンは、キャプチャしたい 1 つのキャプチャグループを指定します。\sあなたが書いたようにパターンに含めていたらMatcher.group()、戻り値にその空白が含まれていたでしょう。

score 2 · Accepted Answer

遅い応答ですが、パターンとマッチャーを使用していない場合はDOTALL、正規表現文字列でこの代替を使用できます

(?s)[Your Expression]

基本的(?s)に、改行を含むすべての文字に一致するようにドットに指示します

詳細情報: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html

score 2 · Accepted Answer

正規表現では、最初のグループは常に完全に一致する文字列です。あなたの場合、グループ0ではなくグループ1が必要です。

したがって、に変更mtch.group()するとうまくいくmtch.group(1)はずです：

 String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
 Pattern rgx = Pattern.compile("^(\\w+)\s");
 Matcher mtch = rgx.matcher(str);
 if (mtch.find()) {
     String result = mtch.group(1);
     terms.add(new SearchTerm(result, System.nanoTime()));
 }

score 0 · Accepted Answer

次を試してください。

/* The regex pattern: ^(\w+)\r?\n(.*)$ */
private static final REGEX_PATTERN = 
        Pattern.compile("^(\\w+)\\r?\\n(.*)$");

public static void main(String[] args) {
    String input = "Zither\n Definition: An instrument of music";

    System.out.println(
        REGEX_PATTERN.matcher(input).matches()
    );  // prints "true"

    System.out.println(
        REGEX_PATTERN.matcher(input).replaceFirst("$1 = $2")
    );  // prints "Zither =  Definition: An instrument of music"

    System.out.println(
        REGEX_PATTERN.matcher(input).replaceFirst("$1")
    );  // prints "Zither"
}

java - Java Regex は一致に改行を含めています

5 に答える 5

Related

Reference