python - Pythonのファイルから特定の列数の行のみをインポートする

Question

分析のために for ループで複数のファイルをコードにインポートしようとしていますが、すべてのファイルがまったく同じようにフォーマットされているわけではありません (手動で編集するには多すぎます)。

必要なデータはすべてのファイルで同じで、文字列としてインポートする 13 列です。以下はファイルの例です。

could not open XWindow display
could not open XWindow display

No graphics display available for this session.
Graphics tasks that attempt to plot to an interactive screen will fail.

/data/poohbah/2/asassn/be/F0041-70_2645
###  JD        HJD            UT_date             IMAGE    FWHM  Diff Limit      mag    mag_err       counts   counts_err   flux(mJy)     flux_err
2456784.50841  2456784.50816  2014-05-07.0072681  interp_bf002339_coadd 2.61 -2.65 17.031      15.543  0.093          526.82        44.57   2.328        0.197       
2456789.45407  2456789.45347  2014-05-11.9529421  interp_be003585_coadd 2.26 -2.31 16.869      15.383  0.093          834.50        70.78   2.695        0.229       
2456790.47441  2456790.47419  2014-05-12.9732922  interp_bf004070_coadd 1.72 -2.25 17.246      15.721  0.090          645.67        52.82   1.974        0.162       
...
(data continues)
...
2457895.45745  2457895.45919  2017-05-21.9587133  interp_bf305499_coadd 1.71 -2.45 17.299      15.482  0.068          673.31        42.10   2.461        0.154       
/data/poohbah/1/assassin/bin/./ap_phot_im_cal_test.py:654: RuntimeWarning: invalid value encountered in sqrt
  counts_err_a = np.sqrt( counts_a / options.gain + (area_a * bg_stdev_a **2.0 ) )
/data/poohbah/1/assassin/bin/./ap_phot_im_cal_test.py:369: RuntimeWarning: invalid value encountered in less_equal
  no_detected = np.nonzero( (counts <= limit) & (area >= 0.01) )[0]
/data/poohbah/1/assassin/bin/./ap_phot_im_cal_test.py:367: RuntimeWarning: divide by zero encountered in log10
  maglimit[notbad] = -2.5 * np.log10(limit[notbad]) + def_zeropt

「###」行と最後の「/data」経路の間のデータのみが必要です。すべてのファイルで、このセクションは 13 列でまったく同じようにフォーマットされています。ただし、特定のファイルの最初と最後にある「コメント」は異なる場合があります。「XWindow ディスプレイを開けませんでした」というメッセージがないものもあれば、最後にパスがないものもあります。「#」または「/」で始まる行を無視しようとしましたが、これは最初の行または「counts_err_a」およびこの特定の例の最後にあるそのような行に対しては何もしません。

データを Python にインポートし、特定の数の列を含む行のみを取得する方法はありますか? 疑似コードでは、次のようになります。

open(file_name)
 if column_number = 13
   np.genfromtxt(file_name)
 else skip

score 0 · Accepted Answer

列を数えてみないと何列あるか分からないので、ファイルを読みながらフィルタリングすることはできますが、それでもsplit()行をたどる必要があります。以下のようなもので、たとえばコメントが多い場合は、他のチェックを追加できます。

saved_lines = []
with open(filename) as f:
    for line in f:
        if len(line.split()) == 13:
            saved_lines.append(line)

または同等のcomprehension:

with open(filename) as f:
    saved lines = [line for line in f if len(line.split()) == 13]

python - Pythonのファイルから特定の列数の行のみをインポートする

1 に答える 1

Related

Reference