regex - 複数行の正規表現グループマッチ

Question

正規表現を使用してテンプレート形式を解析しようとしています。

ここにサンプルがあります

Type of Change:                 Modify
Metavance:                      None
AutoSys :                       None
Informatica Migration:          None
FTP Details:                    None
Device/Server:                  DWEIHPRD
DB Objects:                     Delete
                                 ARC_MEDICAL_CLAIM_DETAIL_FK1
DB Name:                        DWEIHPRD
Schema-Table(s):            UTIL
Interface(s):                     IF0515
Reports (RAPS):              None
Ancillary Systems:            None

基本的にすべてが

フィールド : データ (上記の DB オブジェクトの例のように複数行の可能性があります)

^(.+?):(.*)

DBオブジェクトの最初の行を取得するだけであることを除いて、私が望むことをするのにかなり近い. dotall をオンにすると、すべてが貪欲に一致し、すべてが「最初のフィールド」の結果になります。

フィールドとデータの両方の余分な空白が最適に削除されますが、正規表現の一部として発生しない場合、それは大したことではありません。

さらに厄介なことに、私はこの作業をアクセス 97 vbscript で行う必要があるため、より優れた最新の正規表現機能の一部が利用できない可能性があります :(

score 0 · Accepted Answer

Note: this is an ugly solution, but maybe it will help you. As @anubhava suggested, there may be a non-regex solution. I just don't know VBA well enough to say what it might be.

According to this article VBScript for Microsoft Office supports lookaheads, lookbehinds and non-capturing (date on the article was 2009), but I would be quite surprised if support goes back as far as Access 97 - though I could be wrong.

Normally, I would use lookaheads, and non-capturing groups for this, but avoided them because they are unlikely to be supported in Office 97. So note that you will just have to ignore capture group 3 (which is only there to test for optional end of line characters on multi-line matches). Note this will only find matches that spread two-lines.

^(.+):\s+(.+)(\r\n\s+(.+))*
note this has four capture groups, but you will ignore \3. Use \1, \2, and \4 (four will be empty for single line matches)

Explained:

^         # beginning of line
(.+):     # capture one or more characters up to a colon
\s+(.+)   # skip past whitespace, then capture characters up to end of line
(         # open a capturing group (to be thrown away. See explanation above)
  \r\n\s+ # peek ahead to see if there are EOL characters followed by whitespace
  (.+)    # if we got this far, capture whatever characters come after the whitespace
)*        # and make this group optional (and you will ignore it anyway)

regex - 複数行の正規表現グループ マッチ

1 に答える 1

Related

Reference

regex - 複数行の正規表現グループマッチ