c# - indexof と substring を使用してファイルからテキストを抽出しようとしていますが、変数 index は常に -1 です。何が問題なのですか?

Question

たとえば、内部にいくつかの文字列を含むhtmlファイルがあります。

"http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150000&cultuur=en-GB&continent=europa","http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150300&cultuur=en-GB&continent=europa","http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150600&cultuur=en-GB&continent=europa"

各行を抽出したい: http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150000&cultuur=en-GB&continent=europa

次に、次のもの: http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150300&cultuur=en-GB&continent=europa

これは使用しているコードです：

コンストラクターで私がした：

f = File.ReadAllText(localFilename + "test.html");
retrivingText1();


private void retrivingText1()
        {
            string startTag = "http://www.niederschlagsradar.de/images.aspx";//"<Translation>";
            string endTag = "continent=europa";//"</Translation>";
            int startTagWidth = startTag.Length;
            int endTagWidth = endTag.Length;
            index = 0;
            w = new StreamWriter(@"d:\retrivedText1.txt");
            while (true)
            {
                index = f.IndexOf(startTag, index);
                if (index == -1)
                {
                    break;
                }
                // else more to do - index now is positioned at first character of startTag 
                int start = index + startTagWidth;
                index = f.LastIndexOf(endTag, start + 1);
                if (index == -1)
                {
                    break;
                }
                // found the endTag 
                string g = f.Substring(start, index - start + endTagWidth).Trim(); //Trim the founded text so the start and ending spaces are removed.
                w.WriteLine(g);
                //break so you dont have an endless loop
                break;
            }
            w.Close();
        }

htmlファイルから抽出するには、htmlagilitypackまたは正規表現を使用する方がよいことを知っています。しかし、今回は indexof と substring を試してみたかったのです。

行でブレークポイントを使用する場合：

int start = index + startTagWidth;

開始 = 2950

インデックスの次の行 = -1

score 1 · Accepted Answer

あなたが参照しているページには、あなたが探しているテキスト行が見つかりません...

あなたも考えたように、正規表現を使用する方がはるかに優れていると思います：

http:\/\/www\.niederschlagsradar\.de\/images\.aspx\?jaar=-6&type=europa\.precip&datum=\d{12}&cultuur=en-GB&continent=europa

次に、さらに処理するために必要なすべての参照を取得します。

編集

IndexOf と SubString を使用したくない場合。LastIndexOf を間違った方法で使用しています。LastIndexOf は、文字列の先頭に向かって文字列内を逆方向に検索しています。

ドキュメンテーション

代わりに IndexOf を使用してみてください

score 0 · Accepted Answer

あなたのサンプルファイルを考えると、私は好むでしょう:

String[] sa = f.Split(',');
foreach (String s in sa)
{
    String strToWrite = f.Trim('\"');
    //write your string
}

c# - indexof と substring を使用してファイルからテキストを抽出しようとしていますが、変数 index は常に -1 です。何が問題なのですか?

3 に答える 3

Related

Reference