regex - 適切な正規表現が見つかりません

Question

私は次のファイルを持っています（このスキームのようですが、はるかに長いです）：

LSE           ZTX                       
    SWX         ZURN                    
LSE           ZYT
NYSE                            CGI

各行には 2 つの単語 ( ie などLSE ZTX) があり、先頭、末尾、および常にその間にオプションのスペースおよび/またはタブがあります。誰かがこれらの2つの単語を正規表現と一致させるのを手伝ってくれませんか? 例に従って、最初の行に LSE を $1 に、ZTX を $2 に、2 番目の行に SWX を $1 に、ZURN を $2 に配置したいと考えています。

$line =~ /(\t|\s)*?(.*?)(\t|\s)*?(.*?)/msgi;
$line =~ /[\t*\s*]?(.*?)[\t*\s*]?(.*?)/msgi;

スペースまたはタブのいずれかが存在する可能性があることをどのように言えばよいかわかりません (または両方が混在しているため、例: \t\s\t)

score 3 · Accepted Answer

常に 2 つの単語です。行全体を一致させる必要はないため、最も単純な正規表現は次のようになります。

/(\w+)\s+(\w+)/

score 1 · Accepted Answer

\s集計も含まれているため、正規表現は次のようになります。

$line =~ /^\s*([A-Z]+)\s+([A-Z]+)/;

最初の単語は最初のグループ ($1) にあり、2 番目の単語は $2 にあります。

[A-Z]必要に応じて、より便利なものに変更できます。

YAPE::Regex::Explainからの説明は次のとおりです。

The regular expression:

(?-imsx:^\s*([A-Z]+)\s+([A-Z]+))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

score 1 · Accepted Answer

私はこれがあなたが望むものだと思います

^\s*([A-Z]+)\s+([A-Z]+)

ここで Regexrを参照してください。行の最初のコードはグループ 1 にあり、2 番目はグループ 2\sにあります。空白文字であり、スペース、タブ、改行文字などを含みます。

Perl では次のようになります。

($code1, $code2) = $line =~ /^\s*([A-Z]+)\s+([A-Z]+)/i;

テキストファイルを行ごとに読んでいると思うので、修飾子sとm、およびgも必要ありません。

コードが ASCII 文字だけでない場合は、に置き換え[A-Z]ます\p{L}。すべての言語のすべての文字に一致\p{L}するUnicode プロパティです。

score 1 · Accepted Answer

オプション「Multiline」を使用すると、この正規表現は次のようになります。

^\s*(?<word1>\S+)\s+(?<word2>\S+)\s*$

次の名前の 2 つのグループを含む N 個の一致が得られます: - word1 - word2

score 1 · Accepted Answer

^\s*([A-Z]{3,4})\s+([A-Z]{3,4})$

これが何をするか

^             // Matches the beginning of a string
\s*           // Matches a space/tab character zero or more times
([A-Z]{3,4})  // Matches any letter A-Z either 3 or 4 times and captures to $1
\s+           // Then matches at least one tab or space
([A-Z]{3,4})  // Matches any letter A-Z either 3 or 4 times and captures to $2
$             // Matches the end of a string

score 0 · Accepted Answer

ここで使用できsplitます：

use strict;
use warnings;

while (<DATA>) {
    my ( $word1, $word2 ) = split;
    print "($word1, $word2)\n";
}

__DATA__
LSE         ZTX                       
    SWX         ZURN                    
LSE         ZYT
NYSE                            CGI

出力：

(LSE, ZTX)
(SWX, ZURN)
(LSE, ZYT)
(NYSE, CGI)

score -1 · Accepted Answer

行頭のスペースが、必要なコードを識別するために使用するものであると仮定して、これを試してください。

文字列を改行で分割してから、次の正規表現を試してください。

^\s+(\w+\s+){2}$

これは、スペースで始まり、(単語 - スペース - 単語) が続き、スペースで終わる行にのみ一致します。

# ^           --> String start
# \s+         --> Any number of spaces
# (\w+\s+){2} --> A (word followed by some space)x2
# $           --> String end.

ただし、コードだけをキャプチャする場合は、次のことを試してください。

$line =~ /^\s*(\w+)\s+(\w+)/;

# \s*   --> Zero or more whitespace,
# (\w+) --> Followed by a word (group #1),
# \s+   --> Followed by some whitespace,
# (\w+) --> Followed by a word (group #2),

score -2 · Accepted Answer

-2

これはすべてのコードに一致します

/[A-Z]+/

于 2013-01-09T08:13:38.780 に答える

regex - 適切な正規表現が見つかりません

9 に答える 9

Related

Reference