java - Javaの箇条書きを含む正規表現による文の解析

Question

現在、次の正規表現を使用してドキュメント内の文を解析しています。

Pattern.compile("(?<=\\w[\\w\\)\\]](?<!Mrs?|Dr|Rev|Mr|Ms|vs|abd|ABD|Abd|resp|St|wt)[\\.\\?\\!\\:\\@]\\s)");

これはほとんど機能します。例: 次の文字列を指定します。

「メアリーは小さな子羊を飼っていました（ランビーパイ）。特徴は次のとおりです。1. 4 本の足がある 2. 毛が生えている 3. 哺乳類である。白い毛があった。彼女の父親であるラム氏は、マルベリーに住んでいます。小さな白い家の聖。」

次の文が得られます。

Mary had a little lamb (i.e. lamby pie).
Here are its properties: 
1. It has four feet  2. It has fleece 3. It is a mammal. 
It had white fleese. 
Her father, Mr. Lamb, live on Mulbery St. in a little white house.

ただし、私が望むのは次のとおりです。

Mary had a little lamb (i.e. lamby pie).
Here are its properties: 
1. It has four feet  
2. It has fleece 
3. It is a mammal. 
It had white fleese. 
Her father, Mr. Lamb, lives on Mulbery St. in a little white house.

既存の正規表現を変更してこれを行う方法はありますか?

現在、このタスクを達成するために、最初に最初の分割を行い、次に弾丸を確認しています。次のコードは機能しますが、より洗練された解決策があるかどうか疑問に思っています:

public static void doHomeMadeSentenceParser(String temp) {
    Pattern p = Pattern
            .compile("(?<=\\w[\\w\\)\\]](?<!Mrs?|Dr|Rev|Mr|Ms|vs|abd|ABD|Abd|resp|St|wt)[\\.\\?\\!\\:\\@]\\s)");
    String[] sentences = p.split(temp);
    Vector psentences = new Vector();
    Pattern p1 = Pattern.compile("\\b\\d+[.)]\\s");
    for (int x = 0; x < sentences.length; x++) {
        Matcher matcher = p1.matcher(sentences[x]);
        int bstart = 0;
        boolean bulletfound = false;
        while (matcher.find()) {
            bulletfound = true;
            String bullet = sentences[x].substring(bstart, matcher.start());
            if (bullet.length() > 0) {
                psentences.add(bullet);
            }
            bstart = matcher.start();
        }
        if (bulletfound)
            psentences.add(sentences[x].substring(bstart));
        else
            psentences.add(sentences[x]);
    }
    for (int x = 0; x < psentences.size(); x++) {
        String s = (String) psentences.get(x);
        System.out.println(s.trim());
    }
}

助けてくれてありがとう。

エリオット

score 0 · Accepted Answer

行を分割する場所を見つけるために正規表現を使用していると思います。これの正規表現はわかりませんが、数字の後にピリオド (.) が続くものを先読みしていただけますか?

java - Javaの箇条書きを含む正規表現による文の解析

1 に答える 1

Related

Reference