c# - 文字列の一部だけを取得する

Question

今回はC#のヘルプが必要です。

私はこれでhtmlを持っています：

<ul class="ui_sug_list"></ul></div></div></div></form>
</div></div><div class="cnt_listas"><ol id="listagem1" 
class="cols_2"><li><a href="/laura-pausini/73280/">16/5/74
</a></li><li><a href="/laura-pausini/73280/traducao.html">
16/5/74 (tradução)</a></li><li><a href="/laura-pausini/1566533/">16/5/74
(Spanish Version)</a></li><li><a href="/laura-pausini/1566533/traducao.html">
16/5/74 (Spanish Version) (tradução)</a></li><li><a href="/laura-pausini/1991556/">
A Simple Vista</a></li><li><a href="/laura-pausini/1991556/traducao.html">
A Simple Vista (tradução)</a></li>

そのような html をダウンロードします。Web からの集計はありません。曲名とその曲へのリンクのみを印刷する必要があります。ファイルからこの情報だけを取得する方法がわかりません。

ファイルをダウンロードする方法は次のとおりです。

        // Realiza Download do arquivo
        WebClient webClient = new WebClient();
        webClient.DownloadFile(
        "http://letras.mus.br/" + termo_busca + "/", @"C:\Temp\letras.html");

手伝って頂けますか？

score 2 · Accepted Answer

間違いなくHTMLAgilityPackを使用する必要があります。

次のようにリンクとリンク値を取得できます。

 var doc = new HtmlAgilityPack.HtmlDocument();
 doc.LoadHtml(Html);
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    var value = link.Attributes["href"].Value; //gives you the link
    var text = link.InnerText; //gives you the text of the link
 }

htmlアジリティパックも使用するこのクラスを使用することもできます。

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;

namespace Foo.Client
{
    public class Website
    {
        public string Html { get; private set; }

        private Website(string html)
        {
            Html = html;
        }

        public static Website Load(Uri uri)
        {
            validate(uri);
            return new Website(getPageContentFor(uri));
        }

        public List<string> GetHyperLinks()
        {
            var doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(Html);
            return extractLinksFrom(doc.DocumentNode.SelectNodes("//a[@href]"));
        }

        private static string getPageContentFor(Uri uri)
        {
            try
            {
                var request = (HttpWebRequest)WebRequest.Create(uri);
                var response = (HttpWebResponse)request.GetResponse();
                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                    return reader.ReadToEnd();
            }
            catch (WebException)
            {
                return String.Empty;
            }
        }

        private List<string> extractLinksFrom(HtmlNodeCollection nodes)
        {
            var result = new List<string>();
            if (nodes == null) return result;
            foreach (var link in nodes)
                    result.Add(link.Attributes["href"].Value);
            return result;
        }

        private static void validate(Uri uri)
        {
            if (!uri.IsAbsoluteUri)
                throw new ArgumentException("invalid uri format");
        }
    }
}

c# - 文字列の一部だけを取得する

1 に答える 1

Related

Reference