5

テキスト内のすべての引用を出力するSimpleGrepSedPerlOrPythonOneLinerを探しています。


例 1:

echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner

標準出力:

"HAL,"
"said that everything was going extremely well.”

例 2:

cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner

標準出力:

"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"

(対応するテキストへのリンク)。

4

4 に答える 4

7

私はこれが好き:

perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'

少し冗長ですが、最も単純な実装よりも、エスケープされた引用符とバックトラックをはるかにうまく処理します。それが言っていることは:

my $re = qr{
   "               # Begin it with literal quote
   ( 
     (?>           # prevent backtracking once the alternation has been
                   # satisfied. It either agrees or it does not. This expression
                   # only needs one direction, or we fail out of the branch

         [^"\\]    # a character that is not a dquote or a backslash
     |   \\+       # OR if a backslash, then any number of backslashes followed by 
         [^"]      # something that is not a quote
     |   \\        # OR again a backslash
         (?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
         "         # and a quote
     )*            # any number of *set* qualifying phrases
  )                # all batched up together
  "                # Ended by a literal quote
}x;

あなたがそれほど力を必要としないなら-それは対話であり、構造化された引用ではない可能性が高いと言ってください、そして

/"([^"]*)"/ 

おそらく他のものと同様に機能します。

于 2008-12-05T13:32:44.607 に答える
5

ネストされた引用符がある場合、正規表現ソリューションは機能しませんが、例ではこれはうまく機能します

$ echo \"HAL,\" noted Frank, \"said that everything was going extremely well\"  
 | perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"HAL,"
"said that everything was going extremely well"

$ cat eula.txt| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"EULA"
"online"
"Software"
"Workstation Computer"
"Device"
"multiplexing"
"DRM"
"Secure Content"
"DRM Software"
"Secure Content Owners"
"DRM Upgrades"
"WMFSDK"
"Not For Resale"
"NFR,"
"Academic Edition"
"AE,"
"Qualified Educational User."
"Exclusion of Incidental, Consequential and Certain Other Damages"
"Restricted Rights"
"Exclusion des dommages accessoires, indirects et de certains autres dommages"
"Consumer rights"
于 2008-12-05T11:26:19.493 に答える
4
grep -o "\"[^\"]*\""

これは"+ 引用符以外のすべてを何回でも grep +"

-o は、行全体ではなく、一致したテキストのみを出力します。

于 2008-12-05T11:23:56.457 に答える
0
grep -o '"[^"]*"' file

オプション '-o' 印刷のみのパターン

于 2010-03-31T11:19:53.100 に答える