1
<a href="http://newday.com/song.mp3">First Link</a>
<div id="right_song"> 
        <div style="font-size:15px;"><b>Pitbull ft. Chris Brown - Pitbull feat. Chris Brown - International Love mp3</b></div> 
        <div style="clear:both;"></div> 
<div style="float:left;"> 
    <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> 
        <div style="float:left;"> 
    <a href="http://secondurl.com/thisoneshouldonlyoutput" rel="nofollow" target="_blank" style="color:green;">Second Link</a></div>'; 

pregmatch_all を使用して、この html から 2 番目のリンクを取得したいと考えています。私の現在の正規表現は次のようになります。

preg_match_all("/\<a.+?href=(\"|')(?!javascript:|#)(.+?)\.mp3(\"|')/i", $html, $urlMatches);

これは正常に機能し、2 つのリンクが出力されますが、2 つ目のリンクのみを .mp3 拡張子なしで出力したいと考えています。私を助けてください

4

1 に答える 1

0

説明

この正規表現は

  • match the first anchor tag after <div id="rigth_song"> which has an href attribute whose value ends with .mp3
  • will avoid many of the edge cases which make matching html text with a regular expression very difficult.

<div\sid="right_song">.*?<a(?=\s|>)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\shref=(['"]?)(.*?\.mp3)\1(?:\s|\/>|>))(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>.*?<\/a>

enter image description here

Example

Sample Text

Note the difficult edge case in the second anchor tag, like string href="bad.mp3" is nested inside an attribute value; there is a javascript greater then sign > inside a value; and the real href attribute is without quotes.

<a href="http://newday.com/song.mp3">First Link</a>
<div id="right_song"> 
        <div style="font-size:15px;"><b>Pitbull ft. Chris Brown - Pitbull feat. Chris Brown - International Love mp3</b></div> 
        <div style="clear:both;"></div> 
<div style="float:left;"> 
    <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> 
        <div style="float:left;"> 
<a onmouseover=' href="bad.mp3" ; if ( 6 > x ) {funRotate(href); } ; ' href="http://secondurl.com/thisoneshouldonlyoutput.mp3">First Link</a>
</div>

Code

<?php
$sourcestring="your source string";
preg_match('/<div\sid="right_song">.*?<a(?=\s|>)(?=(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*?\shref=([\'"]?)(.*?\.mp3)\1(?:\s|\/>|>))(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*>.*?<\/a>
/imsx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Match

Group 0 gets the text from the <div through to an including the full matching anchor tag
Group 1 gets the opening quote around the href value which is back referenced later
Group 2 gets the href value

[0] => <div id="right_song"> 
        <div style="font-size:15px;"><b>Pitbull ft. Chris Brown - Pitbull feat. Chris Brown - International Love mp3</b></div> 
        <div style="clear:both;"></div> 
<div style="float:left;"> 
    <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> 
        <div style="float:left;"> 
<a onmouseover=' href="bad.mp3" ; if ( 6 > x ) {funRotate(href); } ; ' href="http://secondurl.com/thisoneshouldonlyoutput.mp3">First Link</a>
[1] => "
[2] => http://secondurl.com/thisoneshouldonlyoutput.mp3
于 2013-07-22T03:42:28.370 に答える