python - PythonでExcelの行をスキップする

Question

xlrdライブラリを使用して Excel ファイルを解析する Python スクリプトを作成しています。私が望むのはif、セルに特定の値が含まれるさまざまな列で計算を行うことです。それ以外の場合は、それらの値をスキップします。次に、出力を辞書に保存します。これが私がやろうとしたことです:

import xlrd


workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')

num_rows = worksheet.nrows -1
num_cells = worksheet.ncols - 1

first_col = 0
scnd_col = 1
third_col = 2

# Read Data into double level dictionary
celldict = dict()
for curr_row in range(num_rows)  :

    cell0_val = int(worksheet.cell_value(curr_row+1,first_col))
    cell1_val = worksheet.cell_value(curr_row,scnd_col)
    cell2_val = worksheet.cell_value(curr_row,third_col)

    if cell1_val[:3] == 'BL1' :
        if cell2_val=='toSkip' :
        continue
    elif cell1_val[:3] == 'OUT' :
        if cell2_val == 'toSkip' :
        continue
    if not cell0_val in celldict :
        celldict[cell0_val] = dict()
# if the entry isn't in the second level dictionary then add it, with count 1
    if not cell1_val in celldict[cell0_val] :
        celldict[cell0_val][cell1_val] = 1
        # Otherwise increase the count
    else :
        celldict[cell0_val][cell1_val] += 1

ご覧のとおり、「cell0_val」ごとに「cell1_val」の値の数を数えます。しかし、合計を実行してdictに格納する前に、隣接する列のセルに「toSkip」がある値をスキップしたいと思います。ここで何か間違ったことをしていますが、解決策はもっと簡単だと思います。どんな助けでも大歓迎です。ありがとう。

これが私のシートの例です：

cell0 cell1  cell2
12    BL1    toSkip
12    BL1    doNotSkip
12    OUT3   doNotSkip
12    OUT3   toSkip
13    BL1    doNotSkip
13    BL1    toSkip
13    OUT3   doNotSkip

score 1 · Accepted Answer

ネストされたディクショナリcollections.defaultdictには withを使用します。collections.Counter

これが実際の動作です：

>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> d['red']['blue'] += 1
>>> d['green']['brown'] += 1
>>> d['red']['blue'] += 1
>>> pprint.pprint(d)
{'green': Counter({'brown': 1}),
 'red': Counter({'blue': 2})}

ここでは、コードに統合されています。

from collections import defaultdict, Counter
import xlrd

workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')

first_col = 0
scnd_col = 1
third_col = 2

celldict = defaultdict(Counter)
for curr_row in range(1, worksheet.nrows): # start at 1 skips header row

    cell0_val = int(worksheet.cell_value(curr_row, first_col))
    cell1_val = worksheet.cell_value(curr_row, scnd_col)
    cell2_val = worksheet.cell_value(curr_row, third_col)

    if cell2_val == 'toSkip' and cell1_val[:3] in ('BL1', 'OUT'):
        continue

    celldict[cell0_val][cell1_val] += 1

また、ifステートメントを組み合わせて、の計算をcurr_rowより簡単にするように変更しました。

score 0 · Accepted Answer

cell2_valが等しいときはいつでも現在の行をスキップしたいように見えるので、を計算した後に直接'toSkip'追加するとコードが簡素化されます。if cell2_val=='toSkip' : continuecell2_val

また、あなたが持っている場所

# if the entry isn't in the second level dictionary then add it, with count 1
if not cell1_val in celldict[cell0_val] :
    celldict[cell0_val][cell1_val] = 1
    # Otherwise increase the count
else :
    celldict[cell0_val][cell1_val] += 1

通常のイディオムはもっと似ています

celldict[cell0_val][cell1_val] = celldict[cell0_val].get(cell1_val, 0) + 1

つまり、デフォルト値の 0 を使用して、 keycell1_valがまだにないcelldict[cell0_val]場合にget()が 0 を返すようにします。

python - PythonでExcelの行をスキップする

2 に答える 2

Related

Reference