python - Python：コンテンツが混在するテキストファイルから浮動小数点数を抽出するにはどうすればよいですか？

Question

次のデータを含むタブ区切りのテキストファイルがあります。

    ahi1
    b/se
ahi 
test    -2.435953
        1.218364
    ahi2
    b/se
ahi 
test    -2.001858
        1.303935

2つの浮動小数点数を2つの列を持つ別のcsvファイルに抽出したいと思います。

-2.435953 1.218264

-2.001858 1.303935

現在、私のハッキングの試みは次のとおりです。

 import csv
 from itertools import islice
 results = csv.reader(open('test', 'r'), delimiter="\n")

 list(islice(results,3))
 print results.next()
 print results.next()
 list(islice(results,3))
 print results.next()
 print results.next()

これは理想的ではありません。私はPythonの初心者ですので、事前にお詫び申し上げます。お時間をいただきありがとうございます。

score 2 · Accepted Answer

これがその仕事をするためのコードです：

import re

# this is the same data just copy/pasted from your question
data = """    ahi1
    b/se
ahi 
test    -2.435953
        1.218364
    ahi2
    b/se
ahi 
test    -2.001858
        1.303935"""

# what we're gonna do, is search through it line-by-line
# and parse out the numbers, using regular expressions

# what this basically does is, look for any number of characters
# that aren't digits or '-' [^-\d]  ^ means NOT
# then look for 0 or 1 dashes ('-') followed by one or more decimals
# and a dot and decimals again: [\-]{0,1}\d+\.\d+
# and then the same as first..
pattern = re.compile(r"[^-\d]*([\-]{0,1}\d+\.\d+)[^-\d]*")

results = []
for line in data.split("\n"):
    match = pattern.match(line)
    if match:
        results.append(match.groups()[0])

pairs = []
i = 0
end = len(results)
while i < end - 1:
    pairs.append((results[i], results[i+1]))
    i += 2

for p in pairs:
    print "%s, %s" % (p[0], p[1])

出力：

>>>
-2.435953, 1.218364
-2.001858, 1.303935

番号を印刷する代わりに、リストに保存して、後で一緒に圧縮することができます。私は、Python正規表現フレームワークを使用してテキストを解析しています。正規表現をまだ知らない場合にのみ、正規表現を選択することをお勧めします。テキストやあらゆる種類のマシン生成出力ファイルを解析するのは非常に便利だと思います。

編集：

ああ、ところで、パフォーマンスが心配な場合は、遅い古い2ghz IBM T60ラップトップでテストし、正規表現を使用して約200ミリ秒でメガバイトを解析できます。

更新：私は親切に感じたので、あなたのために最後のステップを行いました：P

score 1 · Accepted Answer

多分これは助けることができます

zip(*[results]*5)

例えば

import csv
from itertools import izip
results = csv.reader(open('test', 'r'), delimiter="\t")
for result1, result2 in (x[3:5] for x in izip(*[results]*5)):
    ... # do something with the result

score 0 · Accepted Answer

十分にトリッキーですが、より雄弁でシーケンシャルなソリューション：

$ grep -v "ahi" myFileName | grep -v se | tr -d "test\" " | awk 'NR%2{printf $0", ";next;}1'
-2.435953, 1.218364
-2.001858, 1.303935

仕組み：基本的に特定のテキスト行を削除し、次に行内の不要なテキストを削除してから、1行おきに書式設定で結合します。美化のためにコンマを追加しました。不要な場合は、awks printf "、"からコンマを省略してください。

python - Python：コンテンツが混在するテキストファイルから浮動小数点数を抽出するにはどうすればよいですか？

3 に答える 3

Related

Reference