java - 章のタイトルを分割する正規表現

Question

章のタイトルをタイトル番号とタイトル名に分割する必要があります。章のタイトルの形式は次のとおりです。

some long text
    3.7.2 sealant durability 
     paragraph with text        // (.*)
    3.7.3 funkční schopnost
     paragraph with text...
    3.1.13 plastic sealant xx 21    
     paragraph with text
    3.1.14 plastic sealant 
    xx 21   
     paragraph with text
    3.7.12 sealant durability
     paragraph with text
    3.7.325 funkční schopnost

編集: 問題は、図解された値が長いテキストの間にあり、特殊文字でいっぱいになっていることです。

以前はコードに従っていましたが、最後のドットの後の最後の桁のみが分割されていました。最後の「\d」の後に「+」文字を入力すると、エラーがスローされます。この問題の正しい正規表現は何ですか?

title.trim().split("(?<=(\\d\\.\\d{1,2}\\.[\\d]))")

期待される出力:

splitedValue[0] : '3.7.2'
splitedValue[1] : 'sealant durability'
...
splitedValue[0] : '3.1.14'
splitedValue[1] : 'plastic sealant xx 21'
...

ここに画像の説明を入力

score 1 · Accepted Answer

正規表現を試すことができます：

 *(\d+(\.\d+)*) (\p{L}+( \p{L}+)*)

\p{L}Unicode 文字のカテゴリを示します。一方、パターンの定数を使用して、毎回式を再コンパイルしないようにする必要があります。たとえば、次のようになります。

private static final Pattern REGEX_PATTERN = 
        Pattern.compile(" *(\\d+(\\.\\d+)*) (\\p{L}+( \\p{L}+)*)");

public static void main(String[] args) {
    String input = "    3.7.2 sealant durability \n     paragraph with text        // (.*)\n    3.7.3 funkční schopnost\n     paragraph with text...\n    3.1.13 plastic sealant xx 21    \n     paragraph with text";

    Matcher matcher = REGEX_PATTERN.matcher(input);
    while (matcher.find()) {
        System.out.println(matcher.group(1)); // Chapter
        System.out.println(matcher.group(3)); // Title
    }
}

matcher.find()の代わりに使用しsplit()ます。

出力：

3.7.2
sealant durability
3.7.3
funkční schopnost
3.1.13
plastic sealant xx

score 0 · Accepted Answer

@EricSteinが指摘しているように、スペースの最初の出現を見つけることは良い考えです。次のように、もう少し柔軟なものを試すこともできます。

String name = "3.7.2 sealant durability";
System.out.println(name.split("\\s+", 2)[1]);

シーラントの耐久性

より一般的には、期待される出力に一致させるために:

String[] splitedValue = name.split("\\s+", 2);

java - 章のタイトルを分割する正規表現

4 に答える 4

Related

Reference