regex - 正規表現: 部分文字列なしで文字列を検索

Question

大きなテキストがあります：

"Big piece of text. This sentence includes 'regexp' word. And this
sentence doesn't include that word"

「 this」で始まり「word」で終わる部分文字列を見つける必要がありますが、単語「regexp 」は含まれていません。

この場合、文字列 " this sentence doesn't include that word" はまさに私が受け取りたいものです。

正規表現を使用してこれを行うにはどうすればよいですか?

score 45 · Accepted Answer

大文字と小文字を区別しないオプションを使用すると、次のように動作するはずです。

\bthis\b(?:(?!\bregexp\b).)*?\bword\b

例: http://www.rubular.com/r/g6tYcOy8IT

説明：

\bthis\b           # match the word 'this', \b is for word boundaries
(?:                # start group, repeated zero or more times, as few as possible
   (?!\bregexp\b)    # fail if 'regexp' can be matched (negative lookahead)
   .                 # match any single character
)*?                # end group
\bword\b           # match 'word'

各\b単語を囲むことで、「thistle」の「this」や「wordy」の「word」と一致するように、部分文字列で一致していないことを確認できます。

これは、開始単語と終了単語の間の各文字をチェックして、除外された単語が発生していないことを確認することによって機能します。

score 10 · Accepted Answer

先読み資産を使用します。

文字列に別の部分文字列が含まれていないかどうかを確認したい場合は、次のように記述できます。

/^(?!.*substring)/

およびの行頭と行末も確認する必要がありthisますword。

/^this(?!.*substring).*word$/

ここでのもう 1 つの問題は、文字列を検索するのではなく、文を検索する必要があることです (私があなたのタスクを正しく理解している場合)。

したがって、ソリューションは次のようになります。

perl -e '
  local $/;
  $_=<>;
  while($_ =~ /(.*?[.])/g) { 
    $s=$1;
    print $s if $s =~ /^this(?!.*substring).*word[.]$/
  };'

使用例：

$ cat 1.pl
local $/;
$_=<>;
while($_ =~ /(.*?[.])/g) {
    $s=$1;
    print $s if $s =~ /^\s*this(?!.*regexp).*word[.]/i;
};

$ cat 1.txt
This sentence has the "regexp" word. This sentence doesn't have the word. This sentence does have the "regexp" word again.

$ cat 1.txt | perl 1.pl 
 This sentence doesn't have the word.

regex - 正規表現: 部分文字列なしで文字列を検索

2 に答える 2

Related

Reference