shell - Unixで（ファイル内の行番号のリストから）2行の間にテキストを印刷する

Question

数千行のサンプルファイルがあります。そのファイルの2つの行番号の間にテキストを印刷したいと思います。行番号を手動で入力するのではなく、テキストを印刷する必要のある行番号のリストを含むファイルがあります。

例：linenumbers.txt

このファイルから行番号を読み取り、各行の範囲の間のテキストを個別の（新しい）ファイルに出力するシェルスクリプトが必要です。

つまり、345行から789行までの行を新しいファイルにFile1.txt印刷し、999行から1056行までのテキストを新しいファイルに印刷する必要がありますFile2.txt。

score 2 · Accepted Answer

を使用する 1 つの方法を次に示しGNU awkます。次のように実行します。

awk -f script.awk numbers.txt file.txt

の内容script.awk:

BEGIN {
    # set the field separator
    FS="|"
}

# for the first file in the arguments list
FNR==NR {

    # add the row number and field one as keys to a multidimensional array with
    # a value of field two
    a[NR][$1]=$2

    # skip processing the rest of the code
    next
}

# for the second file in the arguments list
{
    # for every element in the array's first dimension
    for (i in a) {

        # for every element in the second dimension
        for (j in a[i]) {

            # ensure that the first field is treated numerically
            j+=0

            # if the line number is greater than the first field
            # and smaller than the second field
            if (FNR>=j && FNR<=a[i][j]) {

                # print the line to a file with the suffix of the first file's 
                # line number (the first dimension)
                print > "File" i
            }
        }
    }
}

または、ここにワンライナーがあります：

awk -F "|" 'FNR==NR { a[NR][$1]=$2; next } { for (i in a) for (j in a[i]) { j+=0; if (FNR>=j && FNR<=a[i][j]) print > "File" i } }' numbers.txt file.txt

'old' をお持ちの場合awk、互換性のあるバージョンは次のとおりです。次のように実行します。

awk -f script.awk numbers.txt file.txt

の内容script.awk:

BEGIN {
    # set the field separator
    FS="|"
}

# for the first file in the arguments list
FNR==NR {

    # add the row number and field one as a key to a pseudo-multidimensional
    # array with a value of field two
    a[NR,$1]=$2

    # skip processing the rest of the code
    next
}

# for the second file in the arguments list
{
    # for every element in the array
    for (i in a) {

        # split the element in to another array
        # b[1] is the row number and b[2] is the first field 
        split(i,b,SUBSEP)

        # if the line number is greater than the first field
        # and smaller than the second field
        if (FNR>=b[2] && FNR<=a[i]) {

            # print the line to a file with the suffix of the first file's
            # line number (the first pseudo-dimension)
            print > "File" b[1]
        }
    }
}

または、ここにワンライナーがあります：

awk -F "|" 'FNR==NR { a[NR,$1]=$2; next } { for (i in a) { split(i,b,SUBSEP); if (FNR>=b[2] && FNR<=a[i]) print > "File" b[1] } }' numbers.txt file.txt

score 2 · Accepted Answer

ターゲットファイルには数千行しかないことを考慮してください。ここに迅速で汚い解決策があります。

awk -F'|' '{system("sed -n \""$1","$2"p\" targetFile > file"NR)}' linenumbers.txt

これtargetFileは、数千行を含むファイルです。
ワンライナーでは、ソートする必要はありませんlinenumbers.txt。
oneliner を使用すると、行範囲を重複させることができますlinenumbers.txt

上記のコマンドを実行すると、n 個のfilexファイルが作成されます。は、必要に応じてファイル名パターンを変更できるからnの行数です。linenumbers.txt x1-n

score 1 · Accepted Answer

次のことができます

# myscript.sh
linenumbers="linenumber.txt"
somefile="afile"
while IFS=\| read start  end ; do
    echo "sed -n '$start,${end}p;${end}q;' $somefile  > $somefile-$start-$end"
done < $linenumbers

そのように実行しますsh myscript.sh

sed -n '345,789p;789q;' afile  > afile-345-789
sed -n '999,1056p;1056q;' afile  > afile-999-1056
sed -n '1522,1366p;1366q;' afile  > afile-1522-1366
sed -n '3523,3562p;3562q;' afile  > afile-3523-3562

それからあなたが幸せなときsh myscript.sh | sh

EDITスタイルと正確さに関するウィリアムの優れた点を追加しました。

編集説明

基本的な考え方は、「| sh」によって実行される前に、最初に正しいかどうかをチェックできる一連のシェルコマンドを生成するスクリプトを取得することです。

sed -n '345,789p;789q;sed各行を使用し、エコーしないことを意味します(-n) ; 345 行目から 789 行目に p(rint) という 2 つのコマンドがあり、2 つ目のコマンドは 789 行目にある q(uit)sedです。すべての入力ファイルを読み取って保存した最後の行で終了します。

whileループは read を使用して $linenumbers ファイルから読み取ります。複数readの変数名が指定され、それぞれに入力のフィールドが設定されている場合、通常、フィールドはで区切られspace、変数名が少なすぎる場合はread、残りのデータが最後に配置されます。変数名。

その動作を理解するために、シェルプロンプトに次を入力できます。

ls -l | while read first rest ; do
   echo $first XXXX $rest
done

上記に別の変数secondを追加して、何が起こるかを確認してみてください。明らかなはずです。

問題は、データが s で区切られていることです。これは、入力から読み取るときに|ウィリアムの提案を使用すると、 IFSが変更され、入力が s で区切られ、目的の結果が得られることです。IFS=\||

他の人は自由に編集、修正、拡張できます。

score 1 · Accepted Answer

sedシンプルで迅速なので、サンプルデータファイルを処理するために使用します。sedこれには、行番号ファイルを適切なスクリプトに変換するメカニズムが必要です。これを行うには多くの方法があります。

1 つの方法は、一連の行番号をスクリプトsedに変換するために使用します。sedすべてが標準出力に出力される場合、これは簡単なことです。出力を別のファイルに移動する必要があるため、行番号ファイルの各行に行番号が必要です。行番号を与える 1 つの方法は、nlコマンドです。別の可能性は、を使用することpr -n -l1です。同じsedコマンドラインが両方で機能します。

nl linenumbers.txt |
sed 's/ *\([0-9]*\)[^0-9]*\([0-9]*\)|\([0-9]*\)/\2,\3w file\1.txt/'

指定されたデータファイルに対して、次が生成されます。

345,789w > file1.txt
999,1056w > file2.txt
1522,1366w > file3.txt
3523,3562w > file4.txt

別のオプションは、スクリプトをawk生成することです。sed

awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt

のバージョンでsed標準入力からスクリプトを読み取ることができる場合-f -(GNUでは可能sedですが、BSDでは不可能です)、行番号ファイルをオンザフライでスクリプトにsed変換し、それを使用してサンプルデータを解析できます。sed

awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f - sample.data

システムがをサポートしている場合は/dev/stdin、次のいずれかを使用できます。

awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f /dev/stdin sample.data

awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f /dev/fd/0 sample.data

それができない場合は、明示的なスクリプトファイルを使用します。

awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt > sed.script
sed -n -f sed.script sample.data
rm -f sed.script

厳密には、一時ファイル名が一意 ( mktemp) であり、スクリプトが中断された場合でも削除される( ) ように対処する必要がありますtrap。

tmp=$(mktemp sed.script.XXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15

awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt > $tmp
sed -n -f $tmp sample.data
rm -f $tmp
trap 0

最後trap 0に、スクリプトが正常に終了できるようにします。省略すると、スクリプトは常にステータス 1 で終了します。

Perl と Python は無視しました。どちらも単一のコマンドでこれに使用できます。ファイル管理は面倒なので、使用するsed方が簡単に思えます。just を使用することもできますawk。最初awkのスクリプトでスクリプトawkを作成して負荷の高い作業を実行するか (上記の簡単な拡張)、単一のawkプロセスで両方のファイルを読み取って必要な出力を生成します (より困難ですが、不可能ではありません)。

少なくとも、これは、この仕事を行う方法がたくさんあることを示しています。これが 1 回限りの演習である場合、どちらを選択するかは実際にはあまり重要ではありません。これを繰り返し行う場合は、好きなメカニズムを選択してください。パフォーマンスが気になる場合は、測定してください。行番号をコマンドスクリプトに変換するコストは、ほとんどありません。コマンドスクリプトを使用してサンプルデータを処理するのに時間がかかります。sed私はその時点で優れていると期待しています。私はそれを確認するために測定していません。

score 0 · Accepted Answer

これはうまくいくかもしれません（GNU sed）：

sed -r 's/(.*)\|(.*)/\1,\2w file-\1-\2.txt/' | sed -nf - file

score 0 · Accepted Answer

最初のフィールドを抽出するには、345|789たとえばawkを使用できます

awk -F'|' '{print $1}'

それを他の質問から受け取った回答と組み合わせると、解決策が得られます。

shell - Unixで（ファイル内の行番号のリストから）2行の間にテキストを印刷する

6 に答える 6

Related

Reference