.net - Web ページのソースコードから二重引用符で囲まれたすべての値を抽出する

Question

Web ページのソースコードを保存しました (すべてのブラウザーのオプション)。で始まる引用符の間のすべてをキャッチしたいと思いますhttp://。どうやってやるの？

score 1 · Accepted Answer

string path = ...
var doc = new HtmlDocument();
doc.Load(path);
var links =
    from e in doc.DocumentNode.Descendants()
    from a in e.Attributes
    where a.Value.StartsWith("http://")
    select a.Value;

(プレーンテキストではなく、HTML 属性のリンクのみを返すことに注意してください)

score 0 · Accepted Answer

正規表現を使用:

Dim mc As MatchCollection = Regex.Matches(html, """(http://.+?)""", RegexOptions.IgnoreCase)

For Each m As Match In mc
    Console.WriteLine(m.Groups(1).Value)
Next

html= このページのソースコードの場合の出力例:

http://cdn.sstatic.net/stackoverflow/img/favicon.ico
http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png
http://cdn.sstatic.net/js/stub.js?v=181da36f6419
http://cdn.sstatic.net/stackoverflow/all.css?v=0f0c93534e2b
http://stackoverflow.com/questions/16264292/extract-all-values-between-double-quotes-from-a-webpages-source-code
http://www.gravatar.com/avatar/91d33760d2823fa7cf5c95b41a16fada?s=32&d=identicon&r=PG\
http://stackoverflow.com/users/2264365/ajakblackgoat
http://stackexchange.com
http://chat.stackoverflow.com
... etc

.net - Web ページのソース コードから二重引用符で囲まれたすべての値を抽出する

2 に答える 2

Related

Reference

.net - Web ページのソースコードから二重引用符で囲まれたすべての値を抽出する