python - Excelからインポートされたパンダスタックデータフレーム

Question

Excel から pandas データフレームにインポートされた非常に大きなデータセットがあります。以下に短いデモ例を作成しました。この df は私のインポートの結果です。df

        A   B   C  A.1  B.1  C.1  A.2  B.2  C.2
Vehicle                                          
car       4   5   5  NaN  NaN  NaN  NaN  NaN  NaN
bike    NaN NaN NaN    3    4    5  NaN  NaN  NaN
bus     NaN NaN NaN  NaN  NaN  NaN    2    3    4
car       4   4   3  NaN  NaN  NaN  NaN  NaN  NaN

パンダへのインポートで、列名にサフィックスを付けてラベルを付け直しました。しかし、私のExcelシートでは同じです。（A、B、Cのみ）これからの結果として私が欲しいのは：

df:
         A  B  C
Vehicle         
car      4  5  5
bike     3  4  5
bus      2  3  4
car      4  4  3

誰かがこれで私を助けてくれますか?

より良い説明のために新しいデータフレームを作成しました

     Model   A   B   C   D  A.1  B.1  C.1  D.1  A.2  B.2  C.2  D.2  A.3  B.3  \
0  34005   1   3   4   4  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   
1   1001 NaN NaN NaN NaN    3    4    5    3  NaN  NaN  NaN  NaN  NaN  NaN   
2   2003 NaN NaN NaN NaN  NaN  NaN  NaN  NaN    1    2    3    3  NaN  NaN   
3  28008 NaN NaN NaN NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN    1    2   
4  28008 NaN NaN NaN NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   

   C.3  D.3  A.4  B.4  C.4  D.4  
0  NaN  NaN  NaN  NaN  NaN  NaN  
1  NaN  NaN  NaN  NaN  NaN  NaN  
2  NaN  NaN  NaN  NaN  NaN  NaN  
3    3    3  NaN  NaN  NaN  NaN  
4  NaN  NaN    1    2    3    3

それをより大きな規模で機能させることができなかった

ds_indexed

Out[350]:
       a  b  c  d  e  f  g a.1 b.1 c.1 d.1 e.1 f.1 g.1
model                                                 
30                                                    
28                           5   5   5   4   5   5   4
11                                                    
18                                                    
35                                                    
30                           5   5   5   5   5   5   3
30                           3   3   4   4   4   4   4
27                           5   5   5   4   5   5   3
34                                                    
30                                                    
2                            5   5   5   3   4   5   5
28                                                    
10                                                    
15                                                    
30                                                    
85                                                    
39                                                    
33                           5   4   4   4   3   5   3
3                            5   4   4   4   4   5   4
10                           3   3   3   2   3   4   3
3                            3   4   4   4   3   4   4
9      5  4  5  3  5  5  3

main_cols = ['a','b', 'c', 'd', 'e', 'f', 'g']
new_ds = ds_indexed[main_cols]

for main_col in main_cols:
    suffix_cols = [col for col in ds_indexed.columns 
                   if col.startswith(main_col) and col != main_col]
    for suffix_col in suffix_cols:
        new_ds[main_col] = new_ds[main_col].combine_first(ds_indexed[suffix_col])


new_ds
Out[353]:
   a  b  c  d  e  f  g
model                     
30                        
28                        
11                        
18                        
35                        
30                        
30                        
27                        
34                        
30                        
2                         
28                        
10                        
15                        
30                        
85                        
39                        
33                        
3                         
10                        
3                         
9      5  4  5  3  5  5  3
I can not get all the values in the new dataframe, help

ds_indexed.info()
<class 'pandas.core.frame.DataFrame'>
Index: 22 entries, 30.0 to 9.0
Data columns (total 14 columns):
a      22  non-null values
b      22  non-null values
c      22  non-null values
d      22  non-null values
e      22  non-null values
f      22  non-null values
g      22  non-null values
a.1    22  non-null values
b.1    22  non-null values
c.1    22  non-null values
d.1    22  non-null values
e.1    22  non-null values
f.1    22  non-null values
g.1    22  non-null values
dtypes: object(14)

score 2 · Accepted Answer

値にナンがないと仮定すると、次のことができます。

>>> new_df = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
>>> new_df.columns = ['A', 'B', 'C']
>>> new_df
Out[537]:
       A    B   C
Vehicle         
car    5     5   4
bike   3     4   5
bus    2     3   4
car    4     3   4

そうでない場合は、軸名にいくつかのロジックを使用できます。

main_cols = ['A', 'B', 'C']
new_df = df[main_cols]

for main_col in main_cols:
    suffix_cols = [col for col in df.columns 
                   if col.startswith(main_col) and col != main_col]
    for suffix_col in suffix_cols:
        new_df[main_col] = new_df[main_col].combine_first(df[suffix_col])

python - Excelからインポートされたパンダスタックデータフレーム

2 に答える 2

Related

Reference