regex - 正規表現 - 貪欲ですが、文字列が一致する前に停止します

Question

いくつかのデータがあり、それを表形式に変換したいと思います。

入力データはこちら

1- This is the 1st line with a 
newline character
2- This is the 2nd line

各行には、複数の改行文字を含めることができます。

出力

<td>1- This the 1st line with 
a new line character</td>
<td>2- This is the 2nd line</td>

私は次のことを試しました

^(\d{1,3}-)[^\d]*

しかし、1stの数字1までしかマッチしないようです。

文字列で別の \d{1,3}\- を見つけたら、マッチングを停止できるようにしたいと思います。助言がありますか？

編集: EditPad Lite を使用しています。

score 2 · Accepted Answer

言語を指定しませんでした (多くの正規表現の実装があります) が、一般に、探しているものは「正の先読み」と呼ばれ、一致に影響を与えるパターンを追加できますが、一致の一部にはなりません。

使用している言語のドキュメントで先読みを検索してください。

編集: 次のサンプルは vim で動作するようです。

:%s#\v(^\d+-\_.{-})\ze(\n\d+-|%$)#<td>\1</td>

以下の注釈:

%      - for all lines
s#     - substitute the following (you can use any delimiter, and slash is most
         common, but as that will require that we escape slashes in the command
         I chose to use the number sign)
\v     - very magic mode, let's us use less backslashes
(      - start group for back referencing
^      - start of line
\d+    - one or more digits (as many as possible)
-      - a literal dash!
\_.    - any character, including a newline
{-}    - zero or more of these (as few as possible)
)      - end group
\ze    - end match (anything beyond this point will not be included in the match)
(      - start a new group
[\n\r] - newline (in any format - thanks Alan)
\d+    - one or more digits
-      - a dash
|      - or
%$     - end of file
)      - end group
#      - start substitute string
<td>\1</td> - a TD tag around the first matched group

score 2 · Accepted Answer

これは vim 用で、ゼロ幅の正先読みを使用します。

/^\d\{1,3\}-\_.*[\r\n]\(\d\{1,3\}-\)\@=

手順:

/^\d\{1,3\}-              1 to 3 digits followed by -
\_.*                      any number of characters including newlines/linefeeds
[\r\n]\(\d\{1,3\}-\)\@=   followed by a newline/linefeed ONLY if it is followed 
                          by 1 to 3 digits followed by - (the first condition)

編集:これはpcre/rubyでの方法です:

/(\d{1,3}-.*?[\r\n])(?=(?:\d{1,3}-)|\Z)/m

最後のエントリと一致するには、改行で終わる文字列が必要であることに注意してください。

score 2 · Accepted Answer

SEARCH:   ^\d+-.*(?:[\r\n]++(?!\d+-).*)*

REPLACE:  <td>$0</td>

[\r\n]++\nは 1 つまたは複数のキャリッジリターンまたはラインフィードに一致するため、ファイルで Unix ( )、DOS ( \r\n)、または古い Mac ( \r) の行区切り文字が使用されているかどうかを気にする必要はありません。

(?!\d+-)行区切りの後の最初のものは別の行番号ではないと主張します。

所有格を使用して+、区切り記号全体[\r\n]++と一致するようにしました。それ以外の場合、セパレータがの場合、と一致する可能性があり、と一致する可能性があります。\r\n[\r\n]+\r(?!\d+-)\n

EditPad Pro でテスト済みですが、Lite でも動作するはずです。

score 1 · Accepted Answer

3ステップでよろしいでしょうか？

(これらは perl 正規表現です):

最初のものを置き換えます：

$input =~ s/^(\d{1,3})/<td>\1/;

残りを交換する

$input =~ s/\n(\d{1,3})/<\/td>\n<td>\1/gm;

最後の追加:

$input .= '</td>';

score 1 · Accepted Answer

1

(\d+-.+(\r|$)((?!^\d-).+(\r|$))?)

于 2012-05-27T12:46:49.207 に答える

score 1 · Accepted Answer

セパレータのみを一致させて分割することができます。たとえば、C# では、次のように実行できます。

string s = "1- This is the 1st line with a \r\nnewline character\r\n2- This is the 2nd line";
string ss = "<td>" + string.Join("</td>\r\n<td>", Regex.Split(s.Substring(3), "\r\n\\d{1,3}- ")) + "</td>";
MessageBox.Show(ss);

regex - 正規表現 - 貪欲ですが、文字列が一致する前に停止します

6 に答える 6

Related

Reference