python - 重複した要素の一部をリストに収集し、それらを1つの要素にマージします

Question

プロパティとその値を含む巨大なExcelファイルを解析しようとしています。問題は次のとおりです。一部のプロパティには複数の値を含めることができます。

例：

list = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']

する必要があります：

list2 = ['a=1', 'b=2', 'c=3', 'd=4,5,6', 'e=7']

要素は可変長の文字列であり、「=」で区切られます。

これは私がExcelファイルからリストを生成する方法です：

#for each row in the excel file.
for rows in range(DATA_ROW, sheet.nrows):
#generate a list with all properties.
for cols in range(sheet.ncols):
    #if the propertie is not emty 
    if str(sheet.cell(PROPERTIE_ROW,cols).value) is not '':
        proplist.append(sheet.cell(PROPERTIE_ROW,cols).value + '=' + str(sheet.cell(rows,cols).value) + '\n')

試してみましたが、うまくいきませんでした...

last_item = ''
new_list = []
#find and collect multiple values.
for i, item in enumerate(proplist):
#if the propertie is already in the list
if str(item).find(last_item) is not -1:
    #just copy the value and append it to the propertie
    new_list.insert(i, propertie);
else:
    #slize the string in propertie and value
    pos = item.find('=')
    propertie = item[0:pos+1]
    value = item[pos+1:len(item)]
    #save the propertie
    last_item = propertie
    #append item
    new_list.append(item)

どんな助けでも大歓迎です！

score 1 · Accepted Answer

順序が重要でない場合は、おそらく次のようなことに adefaultdictを使用できます。

from collections import defaultdict
orig = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']
d = defaultdict(list)
for item in orig:
    k,v = item.split('=',1)
    d[k].append(v)

new = ['{0}={1}'.format(k,','.join(v)) for k,v in d.items()]
print(new)  #['a=1', 'c=3', 'b=2', 'e=7', 'd=4,5,6']

順序が重要な場合はOrderedDict+を使用できますsetdefaultが、実際にはそれほどきれいではありません。

from collections import OrderedDict
orig = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']
d = OrderedDict()
for item in orig:
    k,v = item.split('=',1)
    d.setdefault(k,[]).append(v)

new = ['{0}={1}'.format(k,','.join(v)) for k,v in d.items()]
print new # ['a=1', 'b=2', 'c=3', 'd=4,5,6', 'e=7']

python - 重複した要素の一部をリストに収集し、それらを1つの要素にマージします

1 に答える 1

Related

Reference