python - Python で複数のテキストファイルから列を抽出する

Question

さまざまなサイトに関連する 5 つのテキストファイルを含むフォルダがあります--

タイトルは次のようにフォーマットされます。

Rockspring_18_SW.417712.WRFc36.ET.2000-2050.txt

Rockspring_18_SW.417712.WRFc36.RAIN.2000-2050.txt

WICA.399347.WRFc36.ET.2000-2050.txt

WICA.399347.WRFc36.RAIN.2000-2050.txt

したがって、基本的にファイル名は- (サイト名).(サイト番号).(WRFc36).(いくつかの変数).(2000-2050.txt)の形式に従います。

これらの各テキストファイルには、同様の形式があり、ヘッダー行はありません: Year Month Day Value (各テキストファイルに ~18500 行で構成)

Python で同様のファイル名 (サイト名とサイト番号が一致する場所) を検索し、ファイルの 1 つからデータの 1 列目から 3 列目を選択して、新しい txt ファイルに貼り付けます。また、サイト (雨など) の各変数から 4 番目の列をコピーして貼り付け、新しいファイルに特定の順序で貼り付けたいと考えています。

すべてのファイルからcsvモジュールを使用して（およびスペース区切りの新しい方言を定義して）データを取得し、新しいテキストファイルに出力する方法は知っていますが、サイトごとに新しいファイルの作成を自動化する方法がわかりません名前/番号を入力し、変数が正しい順序でプロットされることを確認します--

使用したい出力は、サイトごとに 1 つのテキストファイル (5 ではない) で、次の形式 (年、月、日、変数 1、変数 2、変数 3、変数 4、変数 5) で、約 18500 行...

私はここで本当に単純なことを見ていると確信しています...これはかなり初歩的なことのように思えます...しかし、どんな助けも大歓迎です!

アップデート

========

以下のコメントを反映するようにコードを更新しました。
http://codepad.org/3mQEM75e

コレクションから import defaultdict import glob import csv

#Create dictionary of lists--   [A] = [Afilename1, Afilename2, Afilename3...]
#                               [B] = [Bfilename1, Bfilename2, Bfilename3...] 
def get_site_files():
    sites = defaultdict(list)
    #to start, I have a bunch of files in this format ---
    #"site name(unique)"."site num(unique)"."WRFc36"."Variable(5 for each site name)"."2000-2050"
    for fname in glob.glob("*.txt"):
        #split name at every instance of "."
        parts = fname.split(".")
        #check to make sure i only use the proper files-- having 6 parts to name and having WRFc36 as 3rd part
        if len(parts)==6 and parts[2]=='WRFc36':
            #Make sure site name is the full unique identifier, the first and second "parts"
            sites[parts[0]+"."+parts[1]].append(fname)
    return sites

#hardcode the variables for method 2, below
Var=["TAVE","RAIN","SMOIS_INST","ET","SFROFF"]

def main():
    for site_name, files in get_site_files().iteritems():
        print "Working on *****"+site_name+"*****"
####Method 1- I'd like to not hardcode in my variables (as in method 2), so I can use this script in other applications.
        for filename in files:
            reader = csv.reader(open(filename, "rb"))
            WriteFile = csv.writer(open("XX_"+site_name+"_combined.txt","wb"))
            for row in reader:
                row = reader.next()
####Method 2 works (mostly), but skips a LOT of random lines of first file, and doesn't utilize the functionality built into my dictionary of lists...            
##        reader0 = csv.reader(open(site_name+".WRFc36."+Var[0]+".2000-2050.txt", "rb"))    #I'd like to copy ALL columns from the first file
##        reader1 = csv.reader(open(site_name+".WRFc36."+Var[1]+".2000-2050.txt", "rb"))    #    and just the fourth column from all the rest of the files
##        reader2 = csv.reader(open(site_name+".WRFc36."+Var[2]+".2000-2050.txt", "rb"))    #    (the columns 1-3 are the same for all files)
##        reader3 = csv.reader(open(site_name+".WRFc36."+Var[3]+".2000-2050.txt", "rb"))
##        reader4 = csv.reader(open(site_name+".WRFc36."+Var[4]+".2000-2050.txt", "rb"))
##        WriteFile = csv.writer(open("XX_"+site_name+"_COMBINED.txt", "wb"))               #creates new command to write a text file
##
##        for row in reader0:
##            row  = reader0.next()
##            row1 = reader1.next()
##            row2 = reader2.next()
##            row3 = reader3.next()
##            row4 = reader4.next()
##            WriteFile.writerow(row + row1 + row2 + row3 + row4)
##        print "***finished with site***"

if __name__=="__main__":
    main()

score 2 · Accepted Answer

サイトごとにグループ化されたファイルを反復処理する簡単な方法を次に示します。

from collections import defaultdict
import glob

def get_site_files():
    sites = defaultdict(list)
    for fname in glob.glob('*.txt'):
        parts = fname.split('.')
        if len(parts)==6 and parts[2]=='WRFc36':
            sites[parts[0]].append(fname)
    return sites

def main():
    for site,files in get_site_files().iteritems():
        # you need to better explain what you are trying to do here!
        print site, files

if __name__=="__main__":
    main()

私はあなたの切り取りと貼り付けのコラムをまだ理解していません.何を達成しようとしているのかをもっと明確に説明する必要があります.

score 1 · Accepted Answer

ファイル名を取得する限り、次のようなものを使用します。

import os

# Gets a list of all file names that end in .txt
# ON *nix
file_names = os.popen('ls *.txt').read().split('\n')

# ON Windows
file_names = os.popen('dir /b *.txt').read().split('\n')

次に、通常ピリオドで区切られた要素を取得するには、次を使用します。

# For some file_name in file_names
file_name.split('.')

次に、比較に進み、目的の列を抽出できます（open（file_name、'r'）またはCSVパーサーを使用して）

マイケルG。

python - Python で複数のテキスト ファイルから列を抽出する

アップデート

2 に答える 2

Related

Reference

python - Python で複数のテキストファイルから列を抽出する