python - AWK または Python で複数のテキストファイルから 2 番目と最後の 3 行を出力するにはどうすればよいですか?

Question

awk を使用して、複数のテキストファイルから 2 行目と最後の 3 行を印刷しようとすると、うまくいきません。さらに、出力をテキストファイルに送信したいと考えています。

ヘルプや提案をいただければ幸いです。

score 3 · Accepted Answer

これには、ファイル全体がメモリに保持されないという利点があります。

awk 'NR == 2 {print}; {line1 = line2; line2 = line3; line3 = $0} END {print line1; print line2; print line3}' files*

編集：

gawk以下は、他のバージョンのAWKに移植可能なマニュアルのコードを使用しています。ファイルごとの処理を提供します。gawkバージョン4が提供するルールBEGINFILEに注意してください。ENDFILE

#!/usr/bin/awk -f
function beginfile (file) {
    line1 = line2 = line3 = ""
}

function endfile (file) {
    print line1; print line2; print line3
}

FILENAME != _oldfilename \
     {
         if (_oldfilename != "")
             endfile(_oldfilename)
         _oldfilename = FILENAME
         beginfile(FILENAME)
     }

     END   { endfile(FILENAME) }

FNR == 2 {
    print
}

{
    line1 = line2; line2 = line3; line3 = $0
}

それをファイルとして保存し、おそらく「fileparts」と呼びます。次に、次のことを行います。

chmod u+x fileparts

次に、次のことができます。

./fileparts file1 file2 anotherfile somemorefiles*.txt

そして、各ファイルの2行目と最後の3行を1セットの出力に出力します。

または、別のファイルに出力するように変更するか、シェルループを使用して別のファイルに出力することができます。

for file in file1 file2 anotherfile somemorefiles*.txt
do
    ./fileparts "$file" > "$file.out"
done

出力ファイルには好きな名前を付けることができます。それらはテキストファイルになります。

score 1 · Accepted Answer

ファイル全体を一度にメモリに読み込まないようにするには、maxlen が 3 の両端キューを使用して、最後の 3 行をキャプチャするためのローリングバッファを作成します。

from collections import deque
def get2ndAndLast3LinesFrom(filename):
    with open(filename) as infile:
        # advance past first line
        next(infile)
        # capture second line
        second = next(infile)
        # iterate over the rest of the file a line at a time, saving the final 3
        last3 = deque(maxlen=3)
        last3.extend(infile)        
        return second, list(last3)

このアプローチを、任意の iterable を取る関数に一般化できます。

def lastN(n, seq):
    buf = deque(maxlen=n)
    buf.extend(seq)
    return list(buf)

次に、部分を使用して異なる長さの「last-n」関数を作成できます。

from functools import partial
last3 = partial(lastN, 3)

print last3(xrange(100000000)) # or just use range in Py3

score 0 · Accepted Answer

awkについてはわかりませんが、Pythonを使用している場合は、次のようなものが必要になると思います

inf = open('test1.txt','rU')
lines = inf.readlines()
outf = open('Spreadsheet.ods','w')
outf.write(str(lines[1]))
outf.write(str(lines[-3]))
outf.write(str(lines[-2]))
outf.write(str(lines[-1]))
outf.close()
inf.close()

score 0 · Accepted Answer

これは機能しますが、ファイル全体をメモリにロードするため、ファイルが非常に大きい場合は理想的ではない可能性があります。

text = filename.readlines()

print text[2] # print second line

for i in range(1,4): # print last three lines
    print text[-i]

ここで説明するいくつかの優れた代替手段もあります。

python - AWK または Python で複数のテキスト ファイルから 2 番目と最後の 3 行を出力するにはどうすればよいですか?

5 に答える 5

Related

Reference

python - AWK または Python で複数のテキストファイルから 2 番目と最後の 3 行を出力するにはどうすればよいですか?