java - 特定のタグ間を読み取るバッファリングされたリーダーからサブストリングを抽出しようとしています

Question

bufferedreaderを使用して5つのWebページを抽出していますが、それぞれがスペースで区切られています。サブストリングを使用して、各ページのurl、html、source、およびdateを抽出します。しかし、これを達成するために部分文字列を適切に使用する方法についてのガイダンスが必要です、乾杯。

public static List<WebPage> readRawTextFile(Context ctx, int resId) {   

    InputStream inputStream = ctx.getResources().openRawResource(
            R.raw.pages);

    InputStreamReader inputreader = new InputStreamReader(inputStream);
    BufferedReader buffreader = new BufferedReader(inputreader);
    String line;
    StringBuilder text = new StringBuilder();

    try {
        while ((line = buffreader.readLine()) != null) {


            if (line.length() == 0) {       
                // ignore for now 
                                //Will be used when blank line is encountered
            }

            if (line.length() != 0)  {
         //here I want the substring to pull out the correctStrings
                int sURL = line.indexOf("<!--");
                    int eURL = line.indexOf("-->");
                line.substring(sURL,eURL);
                **//Problem is here**
            }
        }
    } catch (IOException e) {
        return null;

    }
    return null;
}

score 1 · Accepted Answer

私が欲しいのはこんな感じだと思います、

public class Test {
   public static void main(String args[]) {
    String text = "<!--Address:google.co.uk.html-->";
    String converted1 = text.replaceAll("\\<!--", "");
    String converted2 = converted1.replaceAll("\\-->", "");
    System.out.println(converted2);
   }

}

結果表示：Address：google.co.uk.html

score 0 · Accepted Answer

return nullキャッチブロックでは使用しないでくださいprintStackTrace();。何かがうまくいかなかったかどうかを見つけるのに役立ちます。

        String str1 = "<!--Address:google.co.uk.html-->";
        // Approach 1
        int st = str1.indexOf("<!--"); // gives index which starts from <
        int en = str1.indexOf("-->");  // gives index which starts from -
        str1 = str1.substring(st + 4, en);
        System.out.println(str1);

        // Approach 2
        String str2 = "<!--Address:google.co.uk.html-->";
        str2 = str2.replaceAll("[<>!-]", "");
        System.out.println( str2);

$ 100に注意してください：replaceAllで正規表現を使用すると、正規表現パラメータを含む文字列内のすべてが置き換えられることに注意してください。

java - 特定のタグ間を読み取るバッファリングされたリーダーからサブストリングを抽出しようとしています

2 に答える 2

Related

Reference