sed - sed: 引用符内のスペースをアンダースコアに置き換えます

Question

ifconfig run0 scanスペースで区切られたいくつかのフィールドを持つ入力 (たとえば、OpenBSD から) がありますが、一部のフィールド自体にはスペースが含まれています (幸いなことに、スペースを含むフィールドは常に引用符で囲まれています)。

引用符内のスペースと区切りスペースを区別する必要があります。アイデアは、引用符内のスペースをアンダースコアに置き換えることです。

サンプルデータ：

%cat /tmp/ifconfig_scan | fgrep nwid | cut -f3
nwid Websense chan 6 bssid 00:22:7f:xx:xx:xx 59dB 54M short_preamble,short_slottime
nwid ZyXEL chan 8 bssid cc:5d:4e:xx:xx:xx 5dB 54M privacy,short_slottime
nwid "myTouch 4G Hotspot" chan 11 bssid d8:b3:77:xx:xx:xx 49dB 54M privacy,short_slottime

引用符内のスペースをまだアンダースコアに置き換えていないため、希望どおりに処理されません。

%cat /tmp/ifconfig_scan | fgrep nwid | cut -f3 |\
    cut -s -d ' ' -f 2,4,6,7,8 | sort -n -k4
"myTouch Hotspot" 11 bssid d8:b3:77:xx:xx:xx
ZyXEL 8 cc:5d:4e:xx:xx:xx 5dB 54M
Websense 6 00:22:7f:xx:xx:xx 59dB 54M

score 4 · Accepted Answer

sedのみの解決策（私が必ずしも推奨しているわけではありません）については、次のことを試してください。

echo 'a b "c d e" f g "h i"' |\
sed ':a;s/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta'
a b "c_d_e" f g "h_i"

翻訳：

行の先頭から開始します。
パターンを探しjunk"junk"ます。0 回以上繰り返さjunkれ、引用符がなく、その後にが続きjunk"junk spaceます。
最後のスペースをに置き換え_ます。
成功した場合は、最初に戻ります。

score 4 · Accepted Answer

これを試して：

awk -F'"' '{for(i=2;i<=NF;i++)if(i%2==0)gsub(" ","_",$i);}1' OFS="\"" file

行内の複数の引用部分に対して機能します。

echo '"first part" foo "2nd part" bar "the 3rd part comes" baz'| awk -F'"' '{for(i=2;i<=NF;i++)if(i%2==0)gsub(" ","_",$i);}1' OFS="\"" 
"first_part" foo "2nd_part" bar "the_3rd_part_comes" baz

EDIT 代替フォーム:

awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' file

score 2 · Accepted Answer

の方がいいでしょうperl。コードははるかに読みやすく、保守しやすくなっています。

perl -pe 's:"[^"]*":($x=$&)=~s/ /_/g;$x:ge'

入力すると、結果は次のようになります。

a b "c_d_e" f g "h_i"

説明：

-p            # enable printing
-e            # the following expression...

s             # begin a substitution

:             # the first substitution delimiter

"[^"]*"      # match a double quote followed by anything not a double quote any
              # number of times followed by a double quote

:             # the second substitution delimiter

($x=$&)=~s/ /_/g;      # copy the pattern match ($&) into a variable ($x), then 
                       # substitute a space for an underscore globally on $x. The
                       # variable $x is needed because capture groups and
                       # patterns are read only variables.

$x            # return $x as the replacement.

:             # the last delimiter

g             # perform the nested substitution globally
e             # make sure that the replacement is handled as an expression

いくつかのテスト:

for i in {1..500000}; do echo 'a b "c d e" f g "h i" j k l "m n o "p q r" s t" u v "w x" y z' >> test; done

time perl -pe 's:"[^"]*":($x=$&)=~s/ /_/g;$x:ge' test >/dev/null

real    0m8.301s
user    0m8.273s
sys     0m0.020s

time awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null

real    0m4.967s
user    0m4.924s
sys     0m0.036s

time awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null

real    0m4.336s
user    0m4.244s
sys     0m0.056s

time sed ':a;s/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta' test >/dev/null

real    2m26.101s
user    2m25.925s
sys     0m0.100s

score 1 · Accepted Answer

答えではありません。誰かが興味を持っている場合に備えて、@ steveのperlコードに相当するawkコードを投稿するだけです（そして、将来これを思い出すのに役立ちます）：

@スティーブが投稿しました：

perl -pe 's:"[^\"]*":($x=$&)=~s/ /_/g;$x:ge'

@steve の説明を読むと、その perl コードに相当する最も簡単な awk (推奨される awk ソリューションではありません - @Kent の回答を参照してください) は GNU awk になります。

gawk '{
   head = ""
   while ( match($0,"\"[^\"]*\"") ) {
      head = head substr($0,1,RSTART-1) gensub(/ /,"_","g",substr($0,RSTART,RLENGTH))
      $0 = substr($0,RSTART+RLENGTH)
   }
   print head $0
}'

より多くの変数を使用して POSIX awk ソリューションから開始することで得られます。

awk '{
   head = ""
   tail = $0
   while ( match(tail,"\"[^\"]*\"") ) {
      x = substr(tail,RSTART,RLENGTH)
      gsub(/ /,"_",x)
      head = head substr(tail,1,RSTART-1) x
      tail = substr(tail,RSTART+RLENGTH)
   }
   print head tail
}'

GNU awk の gensub() で行を保存します。

gawk '{
   head = ""
   tail = $0
   while ( match(tail,"\"[^\"]*\"") ) {
      x = gensub(/ /,"_","g",substr(tail,RSTART,RLENGTH))
      head = head substr(tail,1,RSTART-1) x
      tail = substr(tail,RSTART+RLENGTH)
   }
   print head tail
}'

次に、変数 x を取り除きます。

gawk '{
   head = ""
   tail = $0
   while ( match(tail,"\"[^\"]*\"") ) {
      head = head substr(tail,1,RSTART-1) gensub(/ /,"_","g",substr(tail,RSTART,RLENGTH))
      tail = substr(tail,RSTART+RLENGTH)
   }
   print head tail
}'

$0 や NF などが必要ない場合は、変数「tail」を取り除き、ループの後にぶら下がったままにします。

gawk '{
   head = ""
   while ( match($0,"\"[^\"]*\"") ) {
      head = head substr($0,1,RSTART-1) gensub(/ /,"_","g",substr($0,RSTART,RLENGTH))
      $0 = substr($0,RSTART+RLENGTH)
   }
   print head $0
}'

sed - sed: 引用符内のスペースをアンダースコアに置き換えます

5 に答える 5

Related

Reference