python - csv データの親 ID を挿入

Question

この形式のcsvファイルがあります

Country State   City    County
X       A       
X       A       R   
X       A       R       X
X       A       R       Y
X       B       
X       B       S   
X       B       S       X

ツリー (包含) 関係を表します。ここで、この関係を反映する ID と親 ID を挿入する必要があります。例: Y (id=5) の親は R で、id は 3 です。したがって、Y の親フィールドは 3 です。

id  parent  Country State   City    County
1   0       X    
2   1       X       A       
3   2       X       A       R   
4   3       X       A       R       X
5   3       X       A       R       Y
6   1       X       B       
7   6       X       B       S   
8   7       X       B       S       X

何千ものエントリがあるため、手動で行うのは面倒です。Pythonでこれを行うにはどうすればよいですか。つまり、ファイルを読み取り (最初のブロック)、id と親を挿入して出力します (上記の 2 番目のコードブロック)。

score 1 · Accepted Answer

編集：この解決策はより明確になるはずです。これは、新しいアプローチではなく、以前のソリューション ( 1、2 ) の作り直しです。単一のループでコピーがないため、これが理解しやすくなります。

インポートコピーインポート csv インポート StringIO

csv_str = """X,,,
X,A,,
X,A,R,
X,A,R,X
X,A,R,Y
X,B,,
X,B,S,
X,B,S,X
"""

reader = csv.reader(StringIO.StringIO(csv_str))

idx = 0
data = []

for row in reader:
    # insert the row id
    row.insert(0, idx + 1)

    # insert a dummy parent id, it will be replaced with the real
    # value later
    row.insert(1, -1)

    # how deep is the current row
    depth = len([r for r in row if r is not ''])
    # insert the depth as the last value in the row
    row.append(depth)

    if idx > 0:
        # if it's not the first row, calculate it's parent

        # calculate the depth of the previous row
        prev_depth = data[idx - 1][-1]
        if depth > prev_depth:
            # if it's deeper than the previous row, then the previous
            # row is the parent row
            row[1] = data[idx - 1][0]
        elif depth == prev_depth:
            # if it's the same depth as the previous row then it has
            # the same parent as the previous row
            row[1] = data[idx - 1][3]
        else:
            # if it's shallower than the previos row, find the
            # nearest previous row with the same depth and use it's
            # parent as this row's parent.
            ridx = idx - 1
            while (prev_depth != depth and ridx >= 0):
                prev_depth = data[ridx - 1][-1]
                ridx -= 1
            row[1] = data[ridx - 1][0]
    else:
        # if it's the first row it's parent is 0
        row[1] = 0

    # store the new row
    data.append(row)
    idx += 1


# write the CSV
output = StringIO.StringIO()
writer = csv.writer(output)
for row in data:
    # skip the depth value in each row
    writer.writerow(row[:-1])

print output.getvalue()

ここで動作中のコードを確認できます: http://codepad.org/DvGtOw8G

score 1 · Accepted Answer

これは魅力的ではありません (そして Python ではないため、この方法がオプションでない場合は申し訳ありません) が、スクリプトを避けたい場合は、これを使用できます (スクリーンショットのセットアップを想定):

=INDEX(
       $A$1:$A$9,
      MATCH(
            INDIRECT(ADDRESS(ROW(),COUNTA(C2:F2)+1)),
            INDIRECT(
                  SUBSTITUTE(ADDRESS(1,COUNTA(C2:F2)+1,4) & ":" & ADDRESS(1,COUNTA(C2:F2)+1,4),"1","")),
             0),
        1)

これは、データの順序が、参照される前に親の ID が定義されていることを前提としています。を入力するIDには、を使用Fill Seriesして増分リストを作成できます。繰り返しますが、これはきれいではありません (そして、必要なものには適していない可能性があります) が、スクリプト作成を回避できる 1 つの方法です (Python が必要な場合は、CSV モジュールを使用するという JoranBeasley の提案が適しています)。

ここに画像の説明を入力

python - csv データの親 ID を挿入

2 に答える 2

Related

Reference