java - 文字列内の位置周辺の単語を取得する

Question

文字列の特定の位置にある単語を取得したいと思います。たとえば、2 単語後に、2 単語前にします。

たとえば、次の文字列を考えてみましょう。

String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother.";
String find = "I";

for (int index = str.indexOf("I"); index >= 0; index = str.indexOf("I", index + 1))
{
    System.out.println(index);
}

これにより、単語「I」がどこにあるかのインデックスが書き出されます。しかし、これらの位置の周りの単語の部分文字列を取得できるようにしたいと考えています。

「John and I like to」と「and Hiking I have two」を印刷できるようにしたいと考えています。

単一の単語列だけを選択できる必要はありません。「John and」を検索すると、「name is John and I like」が返されます。

これを行うきちんとしたスマートな方法はありますか？

score 11 · Accepted Answer

一つの単語：

Stringのsplit()メソッドを使用してそれを実現できます。この解はO(n)です。

public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and "+
                         "hiking I have two sisters and one brother.";
    String find = "I";

    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        if (sp[i].equals(find)) {
            // have to check for ArrayIndexOutOfBoundsException
            String surr = (i-2 > 0 ? sp[i-2]+" " : "") +
                          (i-1 > 0 ? sp[i-1]+" " : "") +
                          sp[i] +
                          (i+1 < sp.length ? " "+sp[i+1] : "") +
                          (i+2 < sp.length ? " "+sp[i+2] : "");
            System.out.println(surr);
        }
    }
}

出力：

John and I like to
and hiking I have two

複数単語:

find正規表現は、が複数の単語である場合の優れたクリーンなソリューションです。ただし、その性質上、周囲の単語も一致findする場合を見逃します(以下の例を参照)。

以下のアルゴリズムは、すべてのケース (すべてのソリューションのスペース) を処理します。問題の性質上、最悪の場合、この解はO(n*m) ^{( nbeingstrの長さとmbeingfindの長さ)}になることに注意してください。

public static void main(String[] args) {
    String str = "Hello my name is John and John and I like to go...";
    String find = "John and";

    String[] sp = str.split(" +"); // "+" for multiple spaces

    String[] spMulti = find.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        int j = 0;
        while (j < spMulti.length && i+j < sp.length 
                                  && sp[i+j].equals(spMulti[j])) {
            j++;
        }           
        if (j == spMulti.length) { // found spMulti entirely
            StringBuilder surr = new StringBuilder();
            if (i-2 > 0){ surr.append(sp[i-2]); surr.append(" "); }
            if (i-1 > 0){ surr.append(sp[i-1]); surr.append(" "); }
            for (int k = 0; k < spMulti.length; k++) {
                if (k > 0){ surr.append(" "); }
                surr.append(sp[i+k]);
            }
            if (i+spMulti.length < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length]);
            }
            if (i+spMulti.length+1 < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length+1]);
            }
            System.out.println(surr.toString());
        }
    }
}

出力：

name is John and John and
John and John and I like

score 2 · Accepted Answer

正規表現を使用して見つけた別の方法を次に示します。

        String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";

        String find = "I";

        Pattern pattern = Pattern.compile("([^\\s]+\\s+[^\\s]+)\\s+"+find+"\\s+([^\\s]+\\s[^\\s]+\\s+)");
        Matcher matcher = pattern.matcher(str);

        while (matcher.find())
        {
            System.out.println(matcher.group(1));
            System.out.println(matcher.group(2));
        }

出力：

John and
like to 
and hiking
have two

score 1 · Accepted Answer

String.split() を使用して、テキストを単語に分割します。次に、「I」を検索し、単語を連結して戻します。

String[] parts=str.split(" ");

for (int i=0; i< parts.length; i++){
   if(parts[i].equals("I")){
     String out= parts[i-2]+" "+parts[i-1]+ " "+ parts[i]+ " "+parts[i+1] etc..
   }
}

もちろん、 i-2 が有効なインデックスであるかどうかを確認する必要があります。大量のデータがある場合は、 StringBuffer を使用するとパフォーマンスが向上します...

score 1 · Accepted Answer

// Convert sentence to ArrayList
String[] stringArray = sentence.split(" ");
List<String> stringList = Arrays.asList(stringArray);

// Which word should be matched?
String toMatch = "I";

// How much words before and after do you want?
int before = 2;
int after = 2;

for (int i = 0; i < stringList.size(); ++i) {
    if (toMatch.equals(stringList.get(i))) {
        int index = i;
        if (0 <= index - before && index + after <= stringList.size()) {
            StringBuilder sb = new StringBuilder();

            for (int i = index - before; i <= index + after; ++i) {
                sb.append(stringList.get(i));
                sb.append(" ");
            }
            String result = sb.toString().trim();
            //Do something with result
        }
    }
}

これにより、一致の前後の 2 つの単語が抽出されます。正確に2 つの単語ではなく、前後に最大2 つの単語を出力するように拡張できます。

編集くそー..遅くて派手な三項演算子がない方法:/

score 0 · Accepted Answer

public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";
    String find = "I";
    int countWords = 3;
    List<String> strings = countWordsBeforeAndAfter(str, find, countWords);
    strings.stream().forEach(System.out::println);
}

public static List<String> countWordsBeforeAndAfter(String paragraph, String search, int countWordsBeforeAndAfter){
    List<String> searchList = new ArrayList<>();
    String str = paragraph;
    String find = search;
    int countWords = countWordsBeforeAndAfter;
    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 0; i < sp.length; i++) {
        if (sp[i].equals(find)) {

            String before = "";
            for (int j = countWords; j > 0; j--) {
                if(i-j >= 0) before += sp[i-j]+" ";
            }

            String after = "";
            for (int j = 1; j <= countWords; j++) {
                if(i+j < sp.length) after += " " + sp[i+j];
            }
            String searhResult = before + find + after;
           searchList.add(searhResult);
        }
    }
    return searchList;
}

java - 文字列内の位置周辺の単語を取得する

5 に答える 5

一つの単語：

複数単語:

Related

Reference