java - `の間のテキストを取得します
`と`
`ArrayListに

Question

文字列から特定のテキストを取得してそれを配列リストに入れる必要がありますが、どこから始めればよいのかわかりません。文字列は次のようになります。

String exampleString = "some text I don't know <pre>the text I want to get</pre><pre>Some more text I want to get</pre> some text I don't know"

しかし、問題は、テキストセクションがいくつあるかわからないこと<pre> text </pre>です。これらのセクションがまったくない可能性さえあります。

<pre>それで、誰かがそれらの間のテキストを取得する方法とそれらを</pre>配列リストに入れる方法を教えてもらえますか？

どうもありがとう！

更新：私が「私が知らないいくつかのテキスト」と言ったテキストについて私が知っていることは、それが含まれていない<pre>か、</pre>

score 2 · Accepted Answer

埋め込まれたタグがないと仮定すると、次のようなことができます。

private List<String> getText(String text){

    List<String> result = new ArrayList<String>();

    String[] sections = text.split("<pre>");
    int i = 0;
    for (String s : sections) {
        i = s.indexOf("</pre>");
        if (i >= 0)          
           results.add(s.substring(0, i));        
    }  
    return result;
}

次の場合に実行されるコードの例

いう：

text = "test text here <pre> item one </pre> and then another item <pre> item 2 </pre> and then some stuff."

したがって、最初に説明するのは次のとおりです。

String[] sections = text.split("<pre");

これにより、文字列の新しい配列が定義され、「テキスト」の文字列分割関数の呼び出し結果に割り当てられます。

この関数は、文字列をで区切られたセクションに分割するため、次の"<pre>"ようになります。

sections[0] = "test text here" 
sections[1] = "item one </pre> and then another item"
sections[2] = "item 2 </pre> and then some stuff."

ご覧のとおり、今必要なのは何かを削除することだけです。その後"</pre>"、次のビットが入ります。

for (String s : sections)

配列セクションの各要素に文字列を順番に割り当てる「foreach」ループの開始です。

したがって、上記の3つの文字列のそれぞれについて、次のようにします。

 i = s.indexOf("</pre>");
    if (i >= 0)          
       results.add(s.substring(0, i));

したがって、文字列にが含まれている場合は</pre>、最初からからまでの部分文字列を取得し"</pre>"て、結果に追加します。セクション[1]とセクション[2]にはそれが含まれているため、結果になります。

これがお役に立てば幸いです。

while（true）の使用を避けるためにJavaJugglersソリューションを実装する方法は次のとおりです。

private List<String> getText(String text){
    List<String> result = new ArrayList<String>();

    int indexStart = text.indexOf("<pre>");
    int indexEnd = text.indexOf("</pre>");
    while (indexStart >= 0 && indexEnd > indexStart) {
        result.add(text.substring(indexStart + 5, indexEnd));
        text = text.substring(indexEnd + 6);
        indexStart = text.indexOf("<pre>");
        indexEnd = text.indexOf("</pre>");
    }

    return result;
}

score 1 · Accepted Answer

try {
    Pattern pattern = Pattern.compile("<pre>(.+?)</pre>");
    Matcher matcher = pattern.matcher(yourText);

    while (matcher.find()) {
        //  matcher.group() will contain the match from the previous find() statement
    }
}
catch(Exception ex){}

編集：正規表現構文を修正

score 0 · Accepted Answer

HTMLが整形式であることが確実にわかっている場合は、次の簡単なString方法を使用して開始できます。

String foo = "some text I don't know <pre>the text I want to get</pre><pre>Some more text I want to get</pre> some text I don't know";
int preStart = foo.indexOf("<pre>");
int preEnd = foo.indexOf("</pre>", preStart);

if (preStart > -1 && preEnd > preStart)
{
    String inBetweenTags = foo.substring(preStart + 5, preEnd);
    System.out.println(inBetweenTags);
}

http://ideone.com/OkE9B

それ以外の場合は、HTMLパーサーを使用します。

score 0 · Accepted Answer

ここに簡単な解決策があります：

private List<String> getText(String text){

    List<String> result = new ArrayList<String>();

    while(true){
        int indexStart = text.indexOf("<pre>");
        int indexEnd = text.indexOf("</pre>");
        if(indexStart >= 0 && indexEnd >= 0 && indexEnd > indexStart){
            result.add(text.substring(indexStart + 5, indexEnd));
            text = text.substring(indexEnd + 6);
        }
        else{
            break;
        }

    }
    return result;
}

文字列を渡してパラメータとして検索し、部分文字列のオフセットを動的に計算するなど、この関数をより一般的なものに変更できることに注意してください。次のような文字列がある可能性があるため、正規表現を使用することはお勧めしません。

<pre>text<pre>more text</pre>some more text</pre>

ネストされた「pre」タグ付き。

java - `の間のテキストを取得します`と``ArrayListに

4 に答える 4

Related

Reference

java - `の間のテキストを取得します
`と`
`ArrayListに