bash - 構造を別の構造に置き換え、いくつかの値を保持します

Question

bashで入力を出力に転送したいと思います。sedを使おうとしましたが、うまくいきませんでした。おそらく間違っています。これまでのところ、これはありますが（idを抽出できるかどうかを試すためだけに）、機能しません。

sed 's;id="([a-zA-Z:]+)";\\1;p' input

入力

<mediaobject>
    <imageobject id="fig:deployment">
        <caption>Application deployment</caption>
        <imagedata fileref="images/deployment.png" width="90%" />
    </imageobject>
</mediaobject>

出力

<img src="images/deployment.png" width="90%" id="fig:deployment" title="Application deployment" />

score 3 · Accepted Answer

awkは、bashがインストールされているほぼすべての場所で利用可能であり、sedで発生する可能性のあるいくつかの落とし穴を回避できます（たとえば、xmlの属性が一貫して順序付けられていない場合）。

awk '
    ## set a variable to mark that we are in a mediaobject block
    $1=="<mediaobject>" { object=1 }

    ## mark that we have exited the object block
    $1=="</mediaobject>" { object=0 }

    ## if we are in an mediaobject block and we find an imageblock
    $1=="<imageobject" && object==1 { 
        iobject=1                          ## record that we are in an imageblock
        id = substr($2, 5, length($2) - 6) ## this is unnecessary for output
    }

    ## if we have a line with image data
    $1~/<imagedata/ && iobject==1 {
        fileref=substr($2,9,length($2)-8)  ## the path, including the quotations
        width=$3                           ## the width
    }

    ## if we have a caption line
    $1~/<caption>/ && iobject==1 {
        gsub("(</?caption>|^ *| *$)", "")  ## remove xml and leading/trailing whitespace
        caption=$0                         ## record the modified line as the caption
    }

    ## when we arrive at the end of an imageblock
    $1=="</imageobject>" && object==1 {
        iobject=0                                                            ## record it
        printf("<img src=%s %s title=\"%s\" />\n", fileref, width, caption)  ## print record
    }

' input

前述したように、このコードは、属性がどのように順序付けられていても同じように機能するはずですが、行の属性が順序を変更すると失敗します（可能性は低くなります）。その問題が発生した場合は、次のようなことができます。

## use match to find the beginning of the attribute
## use a nested substr() to pull only the value of fileref (with quotations)
fileref = substr(substr($0, match($0,/fileref=[a-z\/"]+/),RLENGTH),9))

score 0 · Accepted Answer

xshの使用：

open 1.xml ;
rename img mediaobject ;
mv img/imageobject/@id into img ;
set img/@title img/imageobject/caption ;
set img/@src img/imageobject/imagedata/@fileref ;
mv img/imageobject/imagedata/@width into img ;
rm (img/* | img/text()) ;

score 0 · Accepted Answer

sed付き：

 sed -n '\!<mediaobject>!{
  n;
  s/ *[^ ]* \(id="[^"]*"\).*/\1/; 
  h; n;
  s/ *[^>]*>\([^<]*\).*/title="\1"/;
  H; n;
  s/ *<[^ ]* *fileref=\("[^"]*"\) *\(width="[^"]*"\).*/src=\1 \2/;
  H; n;
  x;
  s/\n/ /g;
  s/^/<img /;
  s/$/ \/>/;
  p
 }' input

bash - 構造を別の構造に置き換え、いくつかの値を保持します

3 に答える 3

Related

Reference