はじめに、HTMLの解析には、信頼性が低く、100%安全ではないなどの理由で、正規表現を使用するべきではないことを知っています。ただし、これは、他の何よりも正規表現の学習演習にすぎません。
したがって、私の例では、bbcWebサイトhttp://www.bbc.co.uk/sport/football/premier-league/tableを使用しています。
プロジェクトは最初のテーブルのtbodyを解析しています。検索値に一致する要素のみが返されるように検索しようとしています。たとえば、「manc」という検索が与えられた場合、マンチェスターシティとマンチェスターユナイテッドのtrタグが必要になります(URLから一致)。
しかし、これまでのところ、これは<tr\b[^>]*>(.*?)manc(.*?)</tr>
マンチェスターの最初のtrから最後のtrまで一致し、manutdの期待される結果を返します。誰かが私がこの正規表現でどこが間違っているのか指摘できますか?
編集:ソース(トリミング)
<tbody id="trc-20-118996114-3">
<tr id="team-138824012" class="team first">
<td class="statistics"></td>
<td class='position'>
<span class='moving-up'>Moving up</span>
<span class='position-number'>1</span>
</td>
<td class="team-name">
<a href='http://www.bbc.co.uk/sport/football/teams/arsenal'>Arsenal</a>
</td>
<td class="played">0</td>
<td class="home-won">
<span>0</span>
</td>
<td class="home-drawn">0</td>
<td class="home-lost">0</td>
<td class="home-for">0</td>
<td class="home-against">0</td>
<td class="away-won">
<span>0</span>
</td>
<td class="away-drawn">0</td>
<td class="away-lost">0</td>
<td class="away-for">0</td>
<td class="away-against">0</td>
<td class="goal-difference">0</td>
<td class="points">0</td>
<td class="last-10-games">
<ol>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win last" title="Win">
<span>Win</span>
</li>
</ol>
</td>
<td class="status">
<a class="report" href="http://www.bbc.co.uk/sport/0/football/17973141">Report</a>
</td>
</tr>
<tr id="team-137316633" class="team">
<td class="statistics"></td>
<td class='position'>
<span class='moving-up'>Moving up</span>
<span class='position-number'>2</span>
</td>
<td class="team-name">
<a href='http://www.bbc.co.uk/sport/football/teams/aston-villa'>Aston Villa</a>
</td>
<td class="played">0</td>
<td class="home-won">
<span>0</span>
</td>
<td class="home-drawn">0</td>
<td class="home-lost">0</td>
<td class="home-for">0</td>
<td class="home-against">0</td>
<td class="away-won">
<span>0</span>
</td>
<td class="away-drawn">0</td>
<td class="away-lost">0</td>
<td class="away-for">0</td>
<td class="away-against">0</td>
<td class="goal-difference">0</td>
<td class="points">0</td>
<td class="last-10-games">
<ol>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="loss last" title="Loss">
<span>Loss</span>
</li>
</ol>
</td>
<td class="status">
<a class="report" href="http://www.bbc.co.uk/sport/0/football/17973120">Report</a>
</td>
</tr>
<tr id="team-137318151" class="team">
<td class="statistics"></td>
<td class='position'>
<span class='moving-down'>Moving down</span>
<span class='position-number'>7</span>
</td>
<td class="team-name">
<a href='http://www.bbc.co.uk/sport/football/teams/manchester-city'>Man City</a>
</td>
<td class="played">0</td>
<td class="home-won">
<span>0</span>
</td>
<td class="home-drawn">0</td>
<td class="home-lost">0</td>
<td class="home-for">0</td>
<td class="home-against">0</td>
<td class="away-won">
<span>0</span>
</td>
<td class="away-drawn">0</td>
<td class="away-lost">0</td>
<td class="away-for">0</td>
<td class="away-against">0</td>
<td class="goal-difference">0</td>
<td class="points">0</td>
<td class="last-10-games">
<ol>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="win last" title="Win">
<span>Win</span>
</li>
</ol>
</td>
<td class="status">
<a class="report" href="http://www.bbc.co.uk/sport/0/football/17973148">Report</a>
</td>
</tr>
<tr id="team-137318152" class="team">
<td class="statistics"></td>
<td class='position'>
<span class='moving-down'>Moving down</span>
<span class='position-number'>8</span>
</td>
<td class="team-name">
<a href='http://www.bbc.co.uk/sport/football/teams/manchester-united'>Man Utd</a>
</td>
<td class="played">0</td>
<td class="home-won">
<span>0</span>
</td>
<td class="home-drawn">0</td>
<td class="home-lost">0</td>
<td class="home-for">0</td>
<td class="home-against">0</td>
<td class="away-won">
<span>0</span>
</td>
<td class="away-drawn">0</td>
<td class="away-lost">0</td>
<td class="away-for">0</td>
<td class="away-against">0</td>
<td class="goal-difference">0</td>
<td class="points">0</td>
<td class="last-10-games">
<ol>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="draw" title="Draw">
<span>Draw</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="loss" title="Loss">
<span>Loss</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win" title="Win">
<span>Win</span>
</li>
<li class="win last" title="Win">
<span>Win</span>
</li>
</ol>
</td>
<td class="status">
<a class="report" href="http://www.bbc.co.uk/sport/0/football/17973162">Report</a>
</td>
</tr>
</tbody>