php - PHP-高度な正規表現ヘルプが必要

Question

したがって、解析する大きなテキスト段落がたくさんあります。最終的な目標は、段落を小さな投稿に分割して、mysqlに挿入できるようにすることです。

文字列内の段落の1つの非常に短い例を次に示します。

<?php
$longstring = '

(<b>John Smith</b>) at <b class="datetimeGMT">2011-01-10 22:13:01 GMT</b><hr>
Lots of text entered here under the first line.<br>And most of it is html, since it is for displaying in a web browser.<br></br></br>

(<b>Alan Slappy</b>) at <b class="datetimeGMT">2011-01-11 13:12:00 GMT</b><hr>
Forgot to put one more thing in the notes.........<br>blah blah blah
(<b>Joe Mama</b>) at <b class="datetimeGMT">2011-01-13 10:15:00 GMT</b><hr>
Groceries list:<br>Watermelons<br>Floss<br><br>email doctor
';

?>

はい、エントリごとにこれらの文字列を解析するという奇妙なプロジェクトがあります。はい、これはクールな作業ではないことに同意します。元の開発者は、元のテキストにテキストを追加することを許可していました。場合によっては悪い考えではありませんが、私にとっては悪い考えです。

この獣を正規表現してforeachループに配置し、クリーンアップを開始する方法についてサポートが必要です。

これが私がどこまで得たかです：

<?php

if(preg_match_all('/\(<b>.*?<hr>/', $longstring, $matches)){
print_r($matches);
}
/* output: 
Array 
( 
    [0] => Array 
        ( 
         [0] => (<b>John Smith</b>) at <b class="datetimeGMT">2011-01-10 22:13:01 GMT</b><hr>
         [1] => (<b>Alan Slappy</b>) at <b class="datetimeGMT">2011-01-11 13:12:00 GMT</b><hr> 
         [2] => (<b>Joe Mama</b>) at <b class="datetimeGMT">2011-01-13 10:15:00 GMT</b><hr> 
        ) 
) 
*/ 
?>

だから、私は実際に各エントリの上部をループすることでかなりうまくやっています。私はそれを理解したことを少し誇りに思っています。（正規表現は私の宿敵です）

そのため、各反復の下に実際のテキストを含める方法を理解するのに行き詰まっています。

preg_match_all各「ヘッダー」の下のテキストを説明するためにを調整する方法について誰かが考えていますか？

score 1 · Accepted Answer

代わりにpreg_splitを使用してみてください。

$matches  = preg_split("/\s*(\(<b>.*?<hr>)\s*/s", trim($longstring), null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

print_r($matches);

注：先頭と末尾のスペースをカットするために、文字列にトリムが適用されます。

結果は次のようになります

Array
(
    [0] => (<b>John Smith</b>) at <b class="datetimeGMT">2011-01-10 22:13:01 GMT</b><hr>
    [1] => Lots of text entered here under the first line.<br>And most of it is html, since it is for displaying in a web browser.<br></br></br>
    [2] => (<b>Alan Slappy</b>) at <b class="datetimeGMT">2011-01-11 13:12:00 GMT</b><hr>
    [3] => Forgot to put one more thing in the notes.........<br>blah blah blah
    [4] => (<b>Joe Mama</b>) at <b class="datetimeGMT">2011-01-13 10:15:00 GMT</b><hr>
    [5] => Groceries list:<br>Watermelons<br>Floss<br><br>email doctor
)

score 0 · Accepted Answer

HTMLの形式を保証できない限り、HTMLを正規表現するだけでなく、HTMLを解析すると、これは簡単になります。

PHP用の堅牢で成熟したHTMLパーサーを確認することをお勧めします。

score 0 · Accepted Answer

これを試して

if(preg_match_all('/\(<b>(?:(?!\(<b>).)*/s', $longstring, $matches)){
  print_r($matches);
}

php - PHP-高度な正規表現ヘルプが必要

3 に答える 3

Related

Reference