erlang - XMLからXPathを使用してURLを抽出する

Question

descriptionタグの下にある2番目のリンクを抽出しようとしています。私は次のコードを書きましたが、それはフレッドとサブストリングで本当に厄介に見えます（それを機能させるためだけに）。これを達成するためのよりクリーンな方法はありますか？

XML抽出

魔法（URL）->

タグ=".xml"、

inets：start（）、

{ok、{ステータス、ヘッダー、本文}} = httpc：request（Url ++ Tag）、

{Xml、Rest} = xmerl_scan：string（Body）、

{xmlObj、string、A} = xmerl_xpath：string（ "substring-after（substring-after（substring-> before（// channel / item / description [ 1 ]、'\"> [link]'）、'br' ）、'href ='） "、Xml）、

{ok、_、B} = io_lib：fread（ "〜6s"、A）、

string：sub_string（B、1、string：len（B）-1）。

score 2 · Accepted Answer

完全な解決策ではありませんが、そのような xpath //channel/item/description[1]/text()[16]と//channel/item/description[1]/text()[24]

抽出された文字列には先頭に URL + 引用符が含まれているため、リストマッチング構文を使用して引用符を削除できます。[_|Url] = ...

したがって、これを使用[{_,_,_,_,[_|U1],_}] = xmerl_xpath:string("//channel/item/description[1]/text()[16]", Xml).して、U1 を最初の URL にバインドします。

シェルでテスト:

11> [{_,_,_,_,[_|U1],_}] = xmerl_xpath:string("//channel/item/description[1]/text()[16]", Xml). 
[{xmlText,[{description,5},{item,5},{channel,1},{rss,1}],
          16,[],"\"http://www.reddit.com/user/escaped_reddit",text}]
12> 
12> U1.
"http://www.reddit.com/user/escaped_reddit"
13> 
13> 
13> [{_,_,_,_,[_|U2],_}] = xmerl_xpath:string("//channel/item/description[1]/text()[24]", Xml). 
[{xmlText,[{description,5},{item,5},{channel,1},{rss,1}],
          24,[],
          "\"http://www.reddit.com/r/erlang/comments/y62wf/how_to_use_ranch/",
          text}]
14> 
14> U2.
"http://www.reddit.com/r/erlang/comments/y62wf/how_to_use_ranch/"

erlang - XMLからXPathを使用してURLを抽出する

1 に答える 1

Related

Reference