bash - awk 動的ドキュメントインデックス作成

Question

インデックスを動的に作成/更新する必要があるドキュメントがあります。これを awk で完了しようとしています。私は部分的な作業例を持っていますが、今は困惑しています。

サンプル文書は以下の通りです。

numbers.txt:
    #) Title
    #) Title
    #) Title
    #.#) Subtitle
    #.#.#) Section
    #.#) Subtitle
    #) Title
    #) Title
    #.#) Subtitle
    #.#.#) Section
    #.#) Subtitle
    #.#.#) Section
    #.#.#.#) Subsection
    #) Title
    #) Title
    #.#) Subtitle
    #.#.#) Section
    #.#.#.#) Subsection
    #.#.#.#) Subsection

望ましい出力は次のようになります。

1) Title
2) Title
3) Title
3.1) Subtitle
3.1.1) Section
3.2) Subtitle
4) Title
5) Title
5.1) Subtitle
5.1.1) Section
5.2) Subtitle
5.2.1) Section
5.2.1.1) Subsection
6) Title
7) Title
7.1) Subtitle
7.1.1) Section
7.1.1.1) Subsection
7.1.1.2) Subsection

部分的に機能する awk コードは次のとおりです。

numbers.sh:
    awk '{for(w=1;w<=NF;w++)if($w~/^#\)/){sub(/^#/,++i)}}1' number.txt

これについての助けをいただければ幸いです。

score 4 · Accepted Answer

awk救助へ！

これがこれを行う最適な方法かどうかはわかりませんが、機能します...

awk    'BEGIN{d="."}
/#\.#\.#\.#/ {sub("#.#.#.#", i d a[i] d b[i d a[i]] d (++c[i d a[i] d b[i d a[i]]]))}
   /#\.#\.#/ {sub("#.#.#"  , i d a[i] d (++b[i d a[i]]))}
      /#\.#/ {sub("#.#"    , i d (++a[i]))}
         /#/ {sub("#"      , (++i))} 1'

更新: 上記は 4 レベルのみに制限されています。これは無制限の数のレベルのためのより良いものです

 awk '{d=split($1,a,"#")-1;                # find the depth
       c[d]++;                             # increase counter for current          
       for(i=pd+1;i<=d;i++) c[i]=1;        # reset when depth increases
       for(i=1;i<=d;i++) {sub(/#/,c[i])};  # replace digits one by one
       pd=d} 1'                            # set previous depth and print

おそらくリセットステップはメインループと組み合わせることができますが、私はこの方法がより明確だと思います。

更新 2:

この論理で、以下は可能な限り最短であると思います。

$ awk '{d=split($1,_,"#")-1;      # find the depth
        c[d]++;                   # increment counter for current depth
        for(i=1;i<=d;i++)         # start replacement
           {if(i>pd)c[i]=1;       # reset the counters
            sub(/#/,c[i])         # replace placeholders with counters
           }
           pd=d} 1' file          # set the previous depth

またはワンライナーとして

$ awk '{d=split($1,_,"#")-1;c[d]++;for(i=1;i<=d;i++){if(i>pd)c[i]=1;sub(/#/,c[i])}pd=d}1'

score 4 · Accepted Answer

AWKスクリプトを実装しました。また、4 つ以上のレベルのインデックスでも機能します! ;)

インラインコメントを使用して少し説明します。

#!/usr/bin/awk -f

# Clears the "array" starting from "from"                                       
function cleanArray(array,from){                                                
    for(w=from;w<=length(array);w++){                                           
        array[w]=0                                                              
    }                                                                           
}                                                                               

# This is executed only one time at beginning.                                  
BEGIN {                                                                         
    # The key of this array will be used to point to the "text index".
    # I.E., an array with (1 2 2) means an index "1.2.2)"           
    array[1]=0      
}                                                                               

# This block will be executed for every line.                                   
{                                                                               
    # Amount of "#" found.                                                      
    amount=0                                                                    

    # In this line will be stored the result of the line.                       
    line=""                                                                     

    # Let's save the entire line in a variable to modify it.                    
    rest_of_line=$0                                                             

    # While the line still starts with "#"...                                   
    while(rest_of_line ~ /^#/){                                                 

        # We remove the first 2 characters.                                     
        rest_of_line=substr(rest_of_line, 3, length(rest_of_line))              

        # We found one "#", let's count it!                                     
        amount++                                                                

        # The line still starts with "#"?                                       
        if(rest_of_line ~ /^#/){                                                
            # yes, it still starts.                                             

            # let's print the appropiate number and a ".".                      
            line=line""array[amount]                                            
            line=line"."                                                        
        }else{                                                                  
            # no, so we must add 1 to the old value of the array.       
            array[amount]++                                                     

            # And we must clean the array if it stores more values              
            # starting from amount plus 1. We don't want to keep                
            # storing garbage numbers that may harm our accounting              
            # for the next line.                                                
            cleanArray(array,amount + 1)                                        

            # let's print the appropiate number and a ")".                      
            line=line""array[amount]                                            
            line=line")"                                                        
        }                                                                       
    }                                                                           

    # Great! We have the line with the appropiate indexes!                      
    print line""rest_of_line                                                    
}

したがって、それをscript.awkとして保存すると、実行権限をファイルに追加して実行できます。

chmod u+x script.awk

最後に、それを実行できます：

./script.awk <path_to_number.txt>

例として、スクリプトscript.awkをファイルnumber.txtと同じディレクトリに保存する場合、ディレクトリをそのディレクトリに変更して実行します。

./script.awk number.txt

したがって、このnumber.txtがある場合

#) Title
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#) Subtitle
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#) Subtitle
#.#.#) Section
#.#.#.#) Subsection
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#) Subsection
#.#.#) Section

これが出力になります（解決策は「＃」の量によって制限されないことに注意してください）：

1) Title
2) Title
3) Title
3.1) Subtitle
3.1.1) Section
3.2) Subtitle
4) Title
5) Title
5.1) Subtitle
5.1.1) Section
5.2) Subtitle
5.2.1) Section
5.2.1.1) Subsection
6) Title
7) Title
7.1) Subtitle
7.1.1) Section
7.1.1.1) Subsection
7.1.1.1.1) Subsection
7.1.1.1.2) Subsection
7.1.1.1.3) Subsection
7.1.1.1.3.1) Subsection
7.1.1.1.4) Subsection
7.1.1.1.4.1) Subsection
7.1.1.1.4.2) Subsection
7.1.1.1.4.3) Subsection
7.1.1.1.4.4) Subsection
7.1.1.1.5) Subsection
7.1.1.2) Subsection
7.1.2) Section

お役に立てば幸いです。

score 2 · Accepted Answer

ガウク

awk 'function w(){
    k=m>s?m:s
    for(i=1;i<=k;i++){
        if(i>m){
            a[i]=0
        }
        else{
            a[i]=(i==m)?++a[i]:a[i]   #ended "#" increase
            sub("#",a[i]=a[i]?a[i]:1) 
        }
    }
    s=m
}
{m=split($1,t,"#")-1;w()}1' file



1) Title
2) Title
3) Title
3.1) Subtitle
3.1.1) Section
3.2) Subtitle
4) Title
5) Title
5.1) Subtitle
5.1.1) Section
5.2) Subtitle
5.2.1) Section
5.2.1.1) Subsection
6) Title
7) Title
7.1) Subtitle
7.1.1) Section
7.1.1.1) Subsection
7.1.1.2) Subsection

score 2 · Accepted Answer

これが私の見解です。FreeBSD でテスト済みなので、ほぼどこでも動作すると思います...

#!/usr/bin/awk -f

BEGIN {
  depth=1;
}

$1 ~ /^#(\.#)*\)$/ {
  thisdepth=split($1, _, ".");

  if (thisdepth < depth) {
    # end of subsection, back out to current depth by deleting array values
    for (; depth>thisdepth; depth--) {
      delete value[depth];
    }
  }
  depth=thisdepth;

  # Increment value of last member
  value[depth]++;

  # And substitute it into the current line.
  for (i=1; i<=depth; i++) {
    sub(/#/, value[i], $0);
  }
}

1

value[]基本的な考え方は、ネストされた章の値の配列 ( ) を維持することです。#必要に応じて配列を更新した後、配列のその位置の現在の値でoctothorpe ( ) の最初の出現を毎回置き換えて、値をステップ実行します。

これはあらゆるレベルのネストを処理し、前述のように、awk の GNU (Linux) および非 GNU (FreeBSD、OSX など) バージョンの両方で動作するはずです。

そしてもちろん、ワンライナーがあなたのものなら、これはコンパクトにすることができます:

awk -vd=1 '$1~/^#(\.#)*\)$/{t=split($1,_,".");if(t<d)for(;d>t;d--)delete v[d];d=t;v[d]++;for(i=1;i<=d;i++)sub(/#/,v[i],$0)}1'

読みやすくするために、次のように表現することもできます。

awk -vd=1 '$1~/^#(\.#)*\)$/{              # match only the lines we care about
    t=split($1,_,".");                    # this line has 't' levels
    if (t<d) for(;d>t;d--) delete v[d];   # if levels decrease, trim the array
    d=t; v[d]++;                          # reset our depth, increment last number
    for (i=1;i<=d;i++) sub(/#/,v[i],$0)   # replace hash characters one by one
  } 1'                                    # and print.

アップデート

そして、これについて少し考えた後、これをさらに縮小できることに気付きました。forループには独自の条件が含まれているため、. 内に配置する必要はありませんif。と

awk '{
    t=split($1,_,".");                  # get current depth
    v[t]++;                             # increment counter for depth
    for(;d>t;d--) delete v[d];          # delete record for previous deeper counters
    d=t;                                # record current depth for next round
    for (i=1;i<=d;i++) sub(/#/,v[i],$0) # replace hashes as required.
  } 1'

もちろん、これは次のようなワンライナーに縮小されます。

awk '{t=split($1,_,".");v[t]++;for(;d>t;d--)delete v[d];d=t;for(i=1;i<=d;i++)sub(/#/,v[i],$0)}1' file

明らかに、必要に応じて最初の一致条件を追加して、タイトルのように見える行のみを処理することができます。

数文字長いにもかかわらず、このバージョンは karakfa の同様のソリューションよりもわずかに速く実行されると思いifますfor。

更新 #2

楽しくて面白いと思ったので、これを含めます。これは bash だけで実行でき、awk は必要ありません。また、コードに関してはそれほど長くはありません。

#!/usr/bin/env bash

while read word line; do
  if [[ $word =~ [#](\.#)*\) ]]; then
    IFS=. read -ra a <<<"$word"
    t=${#a[@]}
    ((v[t]++))
    for (( ; d > t ; d-- )); do unset v[$d]; done
    d=t
    for (( i=1 ; i <= t ; i++ )); do
      word=${word/[#]/${v[i]}}
    done
  fi
  echo "$word $line"
done < input.txt

これは上記の awk スクリプトと同じロジックに従いますが、完全に bash で動作し、パラメータ展開を使用して#文字を置き換えます。欠点の 1 つは、すべての行の最初の単語の前後に空白が維持されないため、インデントが失われることです。少し手を加えるだけで、それも軽減される可能性があります。

楽しみ。

bash - awk 動的ドキュメント インデックス作成

6 に答える 6

Related

Reference

bash - awk 動的ドキュメントインデックス作成