regex - Perl で正規表現を使用して属性を解析する

Question

ここで私が最近遭遇した問題があります。私はフォームの属性文字列を持っています

"x=1 and y=abc and z=c4g and ..."

数値を持つ属性、アルファ値を持つ属性、混在する属性、日付を持つ属性などがあります。

すべての文字列は先頭に" " があるはずx=someval and y=anothervalですが、そうでないものもあります。私がしなければならないことが3つあります。

文字列を検証して、とがあることを確認xしyます。
xとの値を実際に解析しますy。
残りの文字列を取得します。

上部の例を考えると、これは次の変数になります。

$x = 1;
$y = "abc";
$remainder = "z=c4g and ..."

私の質問は、これらを解析して単一の正規表現で検証する (合理的に) 簡単な方法はありますか? すなわち:

if ($str =~ /someexpression/)
{
    $x = $1;
    $y = $2;
    $remainder = $3;
}

文字列はおよび属性のみで構成される場合があることに注意してください。これは有効な文字列です。xy

ソリューションを回答として投稿しますが、単一の正規表現の好みを満たしていません。

score 3 · Accepted Answer

他の name=value ペアでも何かをしたいと仮定すると、これは私がそれを行う方法です (Perl バージョン 5.10 を使用):

use 5.10.0;
use strict;
use warnings;

my %hash;
while(
    $string =~ m{
       (?: ^ | \G )    # start of string or previous match
       \s*

       (?<key>   \w+ ) # word characters
       =
       (?<value> \S+ ) # non spaces

       \s*             # get to the start of the next match
       (?: and )?
    }xgi
){
    $hash{$+{key}} = $+{value};
}

# to make sure that x & y exist
die unless exists $hash{x} and exists $hash{y};

古い Perl (少なくとも Perl 5.6 );

use strict;
use warnings;

my %hash;
while(
    $string =~ m{
       (?: ^ | \G )   # start of string or previous match
       \s*

       ( \w+ ) = ( \S+ )

       \s*            # get to the start of the next match
       (?: and )?
    }xgi
){
    $hash{$1} = $2;
}

# to make sure that x & y exist
die unless exists $hash{x} and exists $hash{y};

これらには、さらに多くのデータを処理する必要がある場合でも、作業を継続できるという追加の利点があります。

score 1 · Accepted Answer

私は正規表現が得意ではありませんが、これはあなたが探しているものにかなり近いようです：

/x=(.+) and y=([^ ]+)( and (.*))?/

$ 1、$ 2、および$4を使用する場合を除きます。使用中で：

my @strs = ("x=1 and y=abc and z=c4g and w=v4l",
            "x=yes and y=no",
            "z=nox and w=noy");

foreach (@strs) {
    if ($_ =~ /x=(.+) and y=([^ ]+)( and (.*))?/) {
        $x = $1;
        $y = $2;
        $remainder = $4;
        print "x: $x; y: $y; remainder: $remainder\n";
    } else {
        print "Failed.\n";
    }
}

出力：

x: 1; y: abc; remainder: z=c4g and w=v4l
x: yes; y: no; remainder: 
Failed.

もちろん、これは多くのエラーチェックを除外し、私はあなたの入力についてすべてを知っているわけではありませんが、これはうまくいくようです。

score 1 · Accepted Answer

ラッドのバージョンへのかなり単純な修正として、

/^x=(.+) and y=([^ ]+)(?: and (.*))?/

$1、$2、および $3 を使用できるようにし (?: は非キャプチャグループにします)、文字列が "not_x=" の一致を許可するのではなく、"x=" で始まるようにします。

x と y の値がどうなるかをよく知っている場合は、これを使用して正規表現をさらに強化する必要があります。

my @strs = ("x=1 and y=abc and z=c4g and w=v4l",
        "x=yes and y=no",
        "z=nox and w=noy",
        "not-x=nox and y=present",
        "x=yes and w='there is no and y=something arg here'");

foreach (@strs) {
    if ($_ =~ /^x=(.+) and y=([^ ]+)(?: and (.*))?/) {
        $x = $1;
        $y = $2;
        $remainder = $3;
        print "x: {$x}; y: {$y}; remainder: {$remainder}\n";
    } else {
        print "$_ Failed.\n";
    }
}

出力：

x: {1}; y: {abc}; remainder: {z=c4g and w=v4l}
x: {yes}; y: {no}; remainder: {}
z=nox and w=noy Failed.
not-x=nox and y=present Failed.
x: {yes and w='there is no}; y: {something}; remainder: {}

x テストに string が失敗するのと同じ制限があった場合、最後のテストの欠落部分はスペースを必要としない y テストの現在のバージョンによるものであることに注意してください。

score 1 · Accepted Answer

Rudd と Cebjyre はほとんどの方法であなたをそこに導きましたが、どちらにも特定の問題があります。

ラッドは次のように提案しました。

/x=(.+) and y=([^ ]+)( and (.*))?/

Cebjyre はそれを次のように変更しました。

/^x=(.+) and y=([^ ]+)(?: and (.*))?/

2 番目のバージョンは、"not_x=foo" と "x=foo" を混同しないため、より優れていますが、"x=foo z=bar y=baz" や set $1 = "foo z=bar" などを受け入れます。望ましくない。

これはおそらくあなたが探しているものです：

/^x=(\w+) and y=(\w+)(?: and (.*))?/

これにより、x= と y= のオプション、places と allow、オプションの "and..." の間のすべてが許可されなくなります。これは $3 になります。

score 0 · Accepted Answer

基本的にこれを解決するために私がしたことは次のとおりです。

($x_str, $y_str, $remainder) = split(/ and /, $str, 3);

if ($x_str !~ /x=(.*)/)
{
    # error
}

$x = $1;

if ($y_str !~ /y=(.*)/)
{
    # error
}

$y = $1;

追加の検証とエラー処理を省略しました。この手法は機能しますが、私が望んでいたほど簡潔でもきれいでもありません。誰かが私にもっと良い提案をしてくれることを願っています。

regex - Perl で正規表現を使用して属性を解析する

5 に答える 5

Related

Reference