html - 文字列から html ガベージを削除する bash スクリプトの正規表現

翻译自：https://stackoverflow.com/questions/18417132 2013-08-24T09:40:16.223

737 次

いくつかの理由で、必要のない余分な html コードを含む html ページをクロールするスクリプトから出力を取得します。

ここに私が持っているものがあります:

... MY DATA IN A SINGLE ROW FOLLOWED BY ...> <script>function fbs_click() {u=location.href;t=document.title;window.open('http://www.facebook.com/sharer.php?u='+encodeURIComponent(u)+'&t='+encodeURIComponent(t),'sharer','toolbar=0,status=0,width=626,height=436');return false;}</script><style> html .fb_share_button { display: -moz-inline-block; display:inline; padding:1px 11px 0 5px; height:15px; border:1px solid #d8dfea; background:url(http://static.ak.facebook.com/images/share/facebook_share_icon.gif?6:26981) no-repeat top right; } html .fb_share_button:hover { color:#fff; border-color:#295582; background:#3b5998 url(http://static.ak.facebook.com/images/share/facebook_share_icon.gif?6:26981) no-repeat top right; text-decoration:none; } </style> <a rel="nofollow" href="http://www.facebook.com/share.php?u=/dizionario/recensi ne.asp?id=11334" class="fb_share_button" onclick="return fbs_click()" target="_blank" style="text-decoration:none;"></a>

おそらく、この余分なコードは、タグの横にある文字列のすべてのコンテンツと含まれているタグを削除する REGEXP で取り除くことができます..

html - 文字列から html ガベージを削除する bash スクリプトの正規表現

1 に答える 1

Related

Reference