php - 正規表現を使用して文字列を配列に分割し、キーと値のペアを取得する

Question

テキストを解析していますが、スペースが欠落している場合はピースを取得できません (これは問題ありません)
編集:フリーテキストにコロンを追加しました。
編集:まあ、これはキーと値のペアを書き込むことができる任意のテキスト形式です。element[0] を破棄すると、配列の残りの要素が一連のキー値になります。また、複数行の値を受け入れます。

これはテストケースのテキストです。

:part1  only one \s removed:OK
:part2 :text :with
new lines
on it
:noSpaceAfterThis
:thisShoudBeAStandAlongText but: here there are more text
:part4 :even more text

これは私が欲しいものです:

Array
(
    [0] => 
    [1] => part1
    [2] =>  only one \s removed:OK
    [3] => part2
    [4] => :text :with
new lines
on it
    [5] => noSpaceAfterThis
    [6] => 
    [7] => thisShoudBeAStandAlongText
    [8] => but: here there are more text
    [9] => part4
    [10] => :even more text
)

これは私が得るものです：

Array
(
    [0] => 
    [1] => part1
    [2] =>  only one \s removed:OK
    [3] => part2
    [4] => :text :with
new lines
on it
    [5] => noSpaceAfterThis
    [6] => :thisShoudBeAStandAlongText but: here there are more text
    [7] => part4
    [8] => :even more text
)

そして、これは私のテストコードです:

<?php
$text = '
:part1  only one \s removed:OK
:part2 :text :with
new lines
on it
:noSpaceAfterThis
:thisShoudBeAStandAlongText but: here there are more text
:part4 :even more text';

echo '<pre>';
// my effort so far:
$ret = preg_split('|\r?\n:([\w\d]+)(?:\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($ret);

// nor this one:
$ret = preg_split('|\r?\n:([\w\d]+)\r?\s?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($ret);

// for debuging, an extra capturing group
$ret = preg_split('|\r?\n:([\w\d]+)(\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
var_dump($ret);

score 3 · Accepted Answer

preg_match_all を使用した別のアプローチ:

$pattern = '~(?<=^:|\n:)\S++|(?<=\s)(?:[^:]+?|(?<!\n):)+?(?= *+(?>\n:|$))~';
preg_match_all($pattern, $text, $matches);
echo '<pre>' . print_r($matches[0], true);

パターンの詳細:

# capture all the first word at line begining preceded by a colon #
(?<=^:|\n:)       # lookbehind, preceded by the begining of the string
                  # and a colon or a newline and a colon
\S++              # all that is not a space

# capture all the content until the next line with : at first position #
(?<=\s)           # lookbehind, preceded by a space
(?:               # open a non capturing group
   [^:]+?         # all character that is not a colon, one or more times (lazy)
  |               # OR
   (?<!^|\n):     # negative lookbehind, a colon not preceded by a newline
                  # or the begining of the string
)+?               # close the non capturing group, 
                  #repeat one or more times (lazy)
(?= *+(?>\n:|$))  # lookahead, followed by spaces (zero or more) and a newline 
                  # with colon at first position or the end of the string

ここでの利点は、無効な結果を回避できることです。

または preg_split を使用:

$res = preg_split('~(?:\s*\n|^):(\S++)(?: )?~', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

説明:

目標は、テキストを 2 つの状況で分割することです。

最初の文字が:
行がで始まるときの行の最初のスペース:

したがって、分割の 2 つのポイントは:word、行の先頭でこのあたりにあります。との:後のスペースは削除する必要がありますが、単語は保持する必要があります。これが、単語を保持するために PREG_SPLIT_DELIM_CAPTURE を使用する理由です。

パターンの詳細:

(?:           # non capturing group (all inside will be removed)
   \s*\n      # trim the spaces of the precedent line and the newline
  |           # OR
   ^          # it is the begining of the string
)             # end of the non capturing group
:             # remove the first character when it is a :
(\S++)        # keep the first word with DELIM_CAPTURE
(?: )?        # remove the first space if present

php - 正規表現を使用して文字列を配列に分割し、キーと値のペアを取得する

1 に答える 1

Related

Reference