php - 正規表現が正しくありません

Question

URLのリストを表示するサイトクローラーがありますが、問題は、最後の正規表現を完全に正しく取得できないことです。すべてのURLは次のようにリストされます。

http://www.website.org/page1.html&--EFTTIUGJ4ITCyh0Frzb_LFXe_eHw
http://website.net/page2/&--EyqBLeFeCkSfmvA7p0cLrsy1Zm1g
http://foobar.website.com/page3.php&--E5WRBxuTOQikDIyBczaVXveOdRFg

URLはすべて異なる可能性があり、静的に見えるのは＆記号だけです。＆記号とそれ以降の右側のすべてを削除するにはどうすればよいでしょうか。

上記の結果で私が試したことは次のとおりです。

function getresults($sterm) {
$html = file_get_html($sterm);
$result = "";
// find all span tags with class=gb1
foreach($html->find('h3[class="r"]') as $ef)
{   
$result .=  $ef->outertext . '<br>';
}
return $result;
}

function geturl($url) {
  $var = $url;
  $result = "";

preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\/url?q=\']+".
               "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/",              

               $var, $matches);

$matches = $matches[1];

foreach($matches as $var)
{    
    $result .= $var."<br>";
}

echo preg_replace('/sa=U.*?usg=.*?AFQjCN/', "--" , $result);

}

score 1 · Accepted Answer

URLが常に同じ形式の場合は、explodeを使用します。

<?php
$tmp = explode("&", "http://foobar.website.com/page3.php&--E5WRBxuTOQikDIyBczaVXveOdRFg");
?>

$ tmp[0]は「http://foobar.website.com/page3.php」をコンテンツする必要があり、$tmp[1]は「--E5WRBxuTOQikDIyBczaVXveOdRFg」をコンテンツする必要があります

score 0 · Accepted Answer

＆文字の後のすべてを削除する簡単な方法：

$result = substr($result, 0, strpos($result, '&'));

php - 正規表現が正しくありません

2 に答える 2

Related

Reference