regex - Webスクレイパー-文字列自体ではなく文字列の長さを返す正規表現Match.Value

Question

現在外出中のプロジェクトで作業しているWebスクレイパーの構成に問題があります

処理するリンクを評価するために、ページから一連のリンクを取得しようとしています。これが私のコードです：

public partial class Form1 : Form
{
    private byte[] aRequestHTML;
    private string sourceString = null;
    string[] a;
    WebClient objWebClient = new WebClient();
    LinkScraper linkScraper = new LinkScraper();

    public Form1()
    {
        InitializeComponent();
    }

    private void button1_Click(object sender, EventArgs e)
    {
        ScrapeLinks(textBox1.Text);
    }


    public void ScrapeLinks(string sourceLink)
    {
        // gets the HTML from the url written in the textbox
        aRequestHTML = objWebClient.DownloadData(sourceLink);
        // creates UTf8 encoding object
        UTF8Encoding utf8 = new UTF8Encoding();
        // gets the UTF8 encoding of all the html we got in aRequestHTML
        sourceString = utf8.GetString(aRequestHTML);
        // this is a regular expression to check for the urls 
        Regex r = new Regex("\\<a\\shref\\=(.*)\\>(.*)\\<\\/a\\>");
        // get all the matches depending upon the regular expression
        MatchCollection mcl = r.Matches(sourceString);

        a = new string[mcl.Count];
        int i = 0;
        foreach (Match ml in mcl)
        {
            // Add the extracted urls to the array list
            a[i] = ml.ToString();
            Console.WriteLine(a[i]);
            i++;
        }

        dataGridView1.DataSource = a;
        // binds the databind

        // The following lines of code writes the extracted Urls to the file named test.txt
        StreamWriter sw = new StreamWriter("test.txt");
        foreach (string aElement in a)
        {
            sw.Write(aElement + "\n");
        }
        sw.Close();
    }
}

私の問題は、データグリッドデータソースの設定から発生します。データグリッドに文字列のリストが入力される代わりに、代わりに各文字列の長さが入力されます。ご覧のとおり、私はtest.txtファイルを書き出して、愚かなことをしているかどうかを確認していますが、データグリッドで表示されると予想されるように、テキストファイルには各文字列が含まれています

私は解決策を求めてフォーラムを12時間トロールしましたが、喜びはありませんでした

誰かが親切に、.Valueがデータグリッドにバインドするために文字列を文字列配列'a'に返さない理由を教えてくれませんか？

どんな助けでもいつも大歓迎です

よろしくバリー

score 0 · Accepted Answer

ちょうど今ソリューションの人々を見つけました

DataGridViewは、その長さプロパティである文字列に対して検出できる最初のプロパティを表示します。回避策は、DataTableを使用することです。

 DataTable links = new DataTable();
 links.Columns.Add("Link URL");

 foreach (Match ml in mcl)
 {
   // Add the extracted urls to table
   links.Rows.Add(new object[] {ml.Value});
 }

score 0 · Accepted Answer

ページをXMLに変換してから、XPathとJavaScriptのE4Xを使用して簡単に作成できます。

私がそれを行ったScriptScraperをチェックしてください。

ありがとう、マーティン

regex - Webスクレイパー-文字列自体ではなく文字列の長さを返す正規表現Match.Value

2 に答える 2

Related

Reference