python - Python: テキストファイルから文字列を抽出してデータとして使用する方法

Question

Pythonスクリプトを書くのはこれが初めてで、始めるのに苦労しています。この情報を含む Test.txt という名前の txt ファイルがあるとします。

                                   x          y          z      Type of atom
ATOM   1     C1  GLN D  10      26.395      3.904      4.923    C
ATOM   2     O1  GLN D  10      26.431      2.638      5.002    O
ATOM   3     O2  GLN D  10      26.085      4.471      3.796    O 
ATOM   4     C2  GLN D  10      26.642      4.743      6.148    C

私が最終的にやりたいことは、これら 3 つの原子の重心を見つけるスクリプトを書くことです。したがって、基本的には、その txt ファイル内のすべての x 値を、各数値に原子の種類に応じて特定の値を掛けて合計したいと考えています。

各 x 値の位置を定義する必要があることはわかっていますが、これらの x 値を文字列からの txt ではなく数値として表す方法を理解するのに苦労しています。これらの数値に原子の種類を掛ける必要があることを覚えておく必要があるため、原子の種類ごとに数値を定義しておく方法が必要です。誰かが私を正しい方向に押し進めることができますか?

score 1 · Accepted Answer

mass_dictionary = {'C':12.0107,
                   'O':15.999
                   #Others...?
                  }

# If your files are this structured, you can just
# hardcode some column assumptions.
coords_idxs = [6,7,8]
type_idx = 9

# Open file, get lines, close file.
# Probably prudent to add try-except here for bad file names.
f_open = open("Test.txt",'r')
lines = f_open.readlines()
f_open.close()

# Initialize an array to hold needed intermediate data.
output_coms = []; total_mass = 0.0;

# Loop through the lines of the file.
for line in lines:

    # Split the line on white space.
    line_stuff = line.split()

    # If the line is empty or fails to start with 'ATOM', skip it.
    if (not line_stuff) or (not line_stuff[0]=='ATOM'):
        pass

    # Otherwise, append the mass-weighted coordinates to a list and increment total mass.
    else:
        output_coms.append([mass_dictionary[line_stuff[type_idx]]*float(line_stuff[i]) for i in coords_idxs])
        total_mass = total_mass + mass_dictionary[line_stuff[type_idx]]

# After getting all the data, finish off the averages.
avg_x, avg_y, avg_z = tuple(map( lambda x: (1.0/total_mass)*sum(x), [[elem[i] for elem in output_coms] for i in [0,1,2]]))


# A lot of this will be better with NumPy arrays if you'll be using this often or on
# larger files. Python Pandas might be an even better option if you want to just
# store the file data and play with it in Python.

score 0 · Accepted Answer

pandasインストールしている場合はread_fwf、固定幅のファイルをインポートしてDataFrame（2次元の表形式のデータ構造）を作成する関数をチェックアウトします。インポート時にコード行を節約でき、追加のデータ操作を行う場合は、多くのデータ変更機能も提供します。

score 0 · Accepted Answer

基本的に、Python のopen関数を使用すると、任意のファイルを開くことができます。したがって、次のようなことができます: --- 次のスニペットは問題全体の解決策ではなく、アプローチです。

def read_file():
    f = open("filename", 'r')
    for line in f:
        line_list = line.split()
        ....
        ....
    f.close()

この時点から、これらの値を使用して何ができるかを適切に設定できます。基本的に、2 行目はファイルを読み取り用に開くだけです。3 行目は、一度に 1 行ずつファイルを読み取り、各行がline変数に入る for ループを定義します。

そのスニペットの最後の行は、基本的に文字列を (すべての空白で) リストに分割します。したがって、 line_list[0] は最初の列の値などになります。この時点から、プログラミングの経験があれば、if ステートメントなどを使用して必要なロジックを取得できます。

** また、そのリストに格納される値の型はすべて文字列であるため、加算などの算術演算を実行する場合は注意が必要です。

*構文修正のために編集

python - Python: テキスト ファイルから文字列を抽出してデータとして使用する方法

3 に答える 3

Related

Reference

python - Python: テキストファイルから文字列を抽出してデータとして使用する方法