c# - リストからすべての画像をダウンロードするにはどうすればよいですか内部の画像へのリンクがたくさんありますか？

Question

private List<string> retrieveImages(string address)
{

    System.Net.WebClient wc = new System.Net.WebClient();
    List<string> imgList = new List<string>();
    doc.Load(wc.OpenRead(address)); //or whatever HTML file you have 
    HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
    if (imgs == null) return new List<string>();

    foreach (HtmlNode img in imgs)
    {
        if (img.Attributes["src"] == null)
            continue;
        HtmlAttribute src = img.Attributes["src"];

        imgList.Add(src.Value);
        //Image imgDownload = GetImage(src.Value);
        //imgDownload.Save(@"d:\myImages");
    }
    return imgList;
}

場合によっては、List imgListに33個のアイテムが含まれており、次のようになります。

そもそも[0]私が見るのは：私が見/images/experiments/nav_logo78.png ない画像へのリンクとして、最初にhttpまたはwwwで始まる/images

それから[1]私が見る場所：//maps.gstatic.com/mapfiles/transparent.png

次に、いくつかのアイテムを配置した後[10]次のように表示されます：http：//mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali

ガリが何であるかわからない、.bmp .gifまたは.pngガリだけをイメージしている。

私が欲しいのは、各リンクからこのすべての画像をダウンロードして、ハードディスクに保存することです。だから私はダウンロードのためにこの機能を持っています：

private Image GetImage(string url)
{
    System.Net.WebRequest request = System.Net.WebRequest.Create(url);

    System.Net.WebResponse response = request.GetResponse();
    System.IO.Stream responseStream = response.GetResponseStream();

    Bitmap bmp = new Bitmap(responseStream);

    responseStream.Dispose();

    return bmp;
}

このGetImage関数をretrieveImages（）関数で使用すると、プログラムは何も実行しません。つまり、ListimgListが空です。この2行を現在のようにマークすると、次のようになります。

//Image imgDownload = GetImage(src.Value);
//imgDownload.Save(@"d:\myImages");

//使用しないでマークを付けると、evrythingは機能しますが、使用している場合は何も機能せず、ハードディスクに何も保存されません。

私は何をすべきか？

編集：

私はretrieveImages関数をこれに変更しました：

private List<string> retrieveImages(string address)
        {

            System.Net.WebClient wc = new System.Net.WebClient();
            List<string> imgList = new List<string>();
            doc.Load(wc.OpenRead(address)); 
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
            if (imgs == null) return new List<string>();

            foreach (HtmlNode img in imgs)
            {
                if (img.Attributes["src"] == null)
                    continue;
                HtmlAttribute src = img.Attributes["src"];

                imgList.Add(src.Value);
                wc.DownloadFile(src.Value ,  @"d:\MyImages\my.gif");
            }
            return imgList;
        }

wc.DownloadFile行にブレークポイントを使用しましたが、例外がスローされました：Webexception Was Caught

パス'D：\ textinputassistant\tia.png'の一部が見つかりませんでした。

src.Valueでは、この場合はwaontinです：/textinputassistant/tia.pngそれで、最初にhttp、https、またはwwwがないリンクを避けるように言われました。修正します。問題は、この行が/で始まり、http / s / wwwがないため、例外があるかどうかです。

完全な例外：

System.Net.WebException was caught
  Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 328
       at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
  InnerException: System.Net.WebException
       Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
       Source=System
       StackTrace:
            at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)
            at System.Net.FileWebRequest.GetResponseCallback(Object state)
       InnerException: System.IO.DirectoryNotFoundException
            Message=Could not find a part of the path 'D:\textinputassistant\tia.png'.
            Source=mscorlib
            StackTrace:
                 at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
                 at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath)
                 at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy)
                 at System.Net.FileWebStream..ctor(FileWebRequest request, String path, FileMode mode, FileAccess access, FileShare sharing, Int32 length, Boolean async)
                 at System.Net.FileWebResponse..ctor(FileWebRequest request, Uri uri, FileAccess access, Boolean asyncHint)

InnerException：

フィルタを追加しただけで、httpで始まるリンクのみが保存されます。

private List<string> retrieveImages(string address)
        {

            System.Net.WebClient wc = new System.Net.WebClient();
            List<string> imgList = new List<string>();
            doc.Load(wc.OpenRead(address));
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
            if (imgs == null) return new List<string>();

            foreach (HtmlNode img in imgs)
            {
                if (img.Attributes["src"] == null)
                    continue;
                HtmlAttribute src = img.Attributes["src"];

                imgList.Add(src.Value);
                if (src.Value.Contains("http"))
                {
                    wc.DownloadFile(src.Value, @"d:\MyImages\my.gif");
                }
            }
            return imgList;
        }

現在、src.Valueには次のものが含まれています：http：//mt1.google.com/vt/lyrs=m@186000000&hl=iw&src=app&x=75&y=51&z=7&s=Gali

次に、ダウンロードしようとした後、例外が発生しました：WebExceptionがキャッチされました

リモートサーバーがエラーを返しました：（403）禁止。

System.Net.WebException was caught
  Message=The remote server returned an error: (403) Forbidden.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at GatherLinks.Form1.retrieveImages(String address) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 330
       at GatherLinks.Form1.webCrawler(String url, Int32 levels, DoWorkEventArgs eve) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 97
  InnerException:

問題は、この場合グーグルがダウンロードをブロックしているため、またはガリウィッチで終わるリンクがどのタイプのファイルであるかわからないために例外がスローされるかどうかです。

score 1 · Accepted Answer

まず、有効なリンクがない、つまりhttp：//がない画像は無視します。

ファイルをディスクに直接保存するには、そのバイナリをダウンロードして、次のように保存します。

string URL="http://www.yourdomain.com/file1.zip";
string DestinationPath="C:\file1.jpg";
System.Net.WebClient Client = new WebClient();
Client.DownloadFile(URL,DestinationPath);

保存するために画像を.net画像に変換する必要はありません。最近書いたいくつかのインポートアプリに似たようなコードがあります

c# - リストからすべての画像をダウンロードするにはどうすればよいですか内部の画像へのリンクがたくさんありますか？

InnerException：

1 に答える 1

Related

Reference