python - openpyxl を使用して timedate として読み取られる浮動数値

Question

1.2、1.07、2.3 などの小さな %f.2 値を含むフィールドを持つ Excel スプレッドシートがあり、何らかの理由で openpyxl がこれらのセルを 1900 年の日付として読み取っています。この問題が提起されているのを何度も見てきましたが、通常、これらのユーザーは日付を期待していて、偽の日付を取得しています. 通常は x<10.0 の値を期待していますが、約 30 ～ 40% の「悪い」データ (timedate として読み取られる) が得られますが、それ以外の場合は数値として読み取られます。

私は反復子を使用しているので、単純な ws.iter_rows() 呼び出しを実行して、一度に 1 行ずつデータを取得します。これを、以前に作成した数値を含む変数に「キャスト」しようとしましたが、あまり効果がありません。

この散発的な問題を克服する方法について誰か提案がありますか。これが既知のバグである場合、既知の回避策はありますか?

ファイルをcsvとして保存し、csvとして再度開き、xlsxとして再保存すると、正しく読み取ることができるファイルになります。これはコードのデバッグに役立ちますが、顧客がこれらの面倒な作業をせずに使用できるソリューションが必要です。

列が正しくフォーマットされていないと、すべての要素に適用されると思うので、これが断続的に発生すると混乱します。

import openpyxl
from openpyxl import load_workbook

# Source workbook - wb

wb = load_workbook(filename = r'C:\data\TEST.xlsx' , use_iterators = True)
ws = wb.get_sheet_by_name(name ='QuoteFile ')

for row in ws.iter_rows():
        print(row[0].internal_value ,row[3].internal_value ,row[4].internal_value         ,row[5].internal_value)


print('Done')

Excelテーブルから見た私の入力は次のとおりです

20015   2.13    1.2 08/01/11
20015   5.03    1.2 08/01/11
20015   5.03    1.2 08/01/11
20015   5.51    1.2 08/01/11
20015   8.13    1.2 08/01/11
20015   5.60    1.2 08/01/11
20015   5.03    1.2 08/01/11
20015   1.50    1.2 08/01/11
20015   1.50    1.2 08/01/11
20015   1.50    1.2 08/01/11
20015   1.50    1.2 08/01/11
20015   1.50    1.2 08/01/11
20015   1.50    1.2 08/01/11

これが私の出力です。最初の 7 行は 2 番目のフィールドが 1900 年の日付であることを示していますが、8 行目から 13 行目はフィールドを数値フィールドとして正しく示しています。

20015.0 1900-01-02 03:07:12 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 12:14:24 1.2 2011-08-01 00:00:00
20015.0 1900-01-08 03:07:12 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 14:24:00 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00

Python 3.3 と openpyxl 1.6.2 を使用

score 2 · Accepted Answer

免責事項: openpyxl の操作方法がわかりません。ただし、ほとんどの場合、心配する必要があるのはdatetimeモジュールだけです。

どの行が数値であると想定されているかがわかっている場合は、次のようなコードを試して、Excel の日付形式を浮動小数点数に変換し、数値の場合は無視することができます。

import datetime
import openpyxl
from openpyxl import load_workbook

# Source workbook - wb

wb = load_workbook(filename = r'C:\data\TEST.xlsx' , use_iterators=True)
ws = wb.get_sheet_by_name(name='QuoteFile ')

If val's a number, return it. Otherwise, take the difference between the datetime
and 1899-12-31 00:00:00. The way the datetimes work is they're internally a float,
being the number of days since the start of 1900. We get the number of seconds in
the delta (done through subtraction) and divide that by 86400 (the number of seconds
in a day).
def forcefloat(val):
    """If val's a number, return it. Otherwise, take the difference between the
    datetime and 1899-12-31 00:00:00. The way the datetimes work is they're
    internally a float, being the number of days since the start of 1900.
    We get the number of seconds in the delta (done through subtraction)
    and divide that by 86400 (the number of seconds in a day)."""
    if isinstance(val, (int, float)):
        return val
    assert isinstance(val, datetime.datetime)
    return (val - datetime.datetime(1899,12,31,0,0,0)).total_seconds() / 86400

for row in ws.iter_rows():
        print(
            row[0].internal_value,
            forcefloat(row[3].internal_value),
            row[4].internal_value,
            row[5].internal_value,
        )

print('Done')

最もエレガントなソリューションではありませんが、機能します。

python - openpyxl を使用して timedate として読み取られる浮動数値

1 に答える 1

Related

Reference