ruby - Ruby での正規表現の問題

Question

Ruby を使用してテキストファイルからデータを取得する際に問題が発生しています。ファイルを開いて読み取り、すべての改行を '%' に置き換えました (改行が問題を引き起こすように思われるため) が、文字列に対して scan を呼び出そうとすると、希望どおりに解析されません。に。この正規表現は必要以上に醜いと思いますが、これが何をしているのか: http://rubular.com/r/JNgleGA5bd

ファイルには番号付きのリストがあり、フォーマットが一貫しているため、リストの各項目を正規表現で取得する必要がありました。私が含めたスニペットでは、「2.（タブ）「その他」のボートメーカーの場合」の前にすべてを取得する必要があります。

文字列のサンプルを次に示します。

"1. あなたのボートのメーカーは?%% [- 1 つ選択 -]%%Var. 1: Code = A2_asdfw, Name = A2_WhatMakeIsYourBoat%%Type = Category%%Template = Standard Category%%Cat. 1: Code = 339 , Name = NONE%%Cat. 2: Code = 3, Name = asdfg%%2. 「その他」のボートメーカーの場合は、ここに記述してください:% _ __ _ __ _ ___ %% Var. 1: Code = A154_asdf, Name = A36_asdfg%%Type = Literal%%Template = 標準文字%%最大長 = 20 文字%%"

これが私の正規表現です：

([0-9]+\.\t[\/0-9a-zA-Z\s,"()'-]+[%\t?:].*?)[0-9]+\.\t[\/0-9a-zA-Z\s,"()'-]+[%\t?:]

score 2 · Accepted Answer

各エントリがパターン「digit-period-tab」で始まると仮定すると、次の正規表現を使用できます。

[0-9][.]\t(?:(?![0-9][.]\t).)*

動作デモ。

ここにいくつかの説明があります：

[0-9]          # match a digit
[.]            # match a period - same as "\.", but more readable IMHO
\t             # match a tab
(?:            # open non-capturing group. this group will match/consume single
               # character, that is not the beginning of the next item
  (?!          # negative lookahead - this does not consume anything, but ensure
               # its contents canNOT be matched at the current position
    [0-9][.]\t # check that there is no new item starting
  )            # end of negative lookahead ... if we get here, the next character
               # still belongs to the current item; note that the engine's
               # "cursor" has not moved
  .            # consume an arbitrary character
)              # end of group
*              # repeat 0 or more times (as often as possible)

ルックアラウンドの詳細。

項目が数字を超える場合(つまり、複数の桁がある場合) は、 both の後に a を9追加するだけです。+[0-9]

ruby - Ruby での正規表現の問題

1 に答える 1

Related

Reference