python - Pandas での複数インデックスの並べ替え

Question

groupby 操作を介して作成されたマルチインデックス DataFrame があります。複数のレベルのインデックスを使用して複合ソートを実行しようとしていますが、必要なソート機能が見つからないようです。

初期データセットは次のようになります (さまざまな製品の毎日の販売数):

         Date Manufacturer Product Name Product Launch Date  Sales
0  2013-01-01        Apple         iPod          2001-10-23     12
1  2013-01-01        Apple         iPad          2010-04-03     13
2  2013-01-01      Samsung       Galaxy          2009-04-27     14
3  2013-01-01      Samsung   Galaxy Tab          2010-09-02     15
4  2013-01-02        Apple         iPod          2001-10-23     22
5  2013-01-02        Apple         iPad          2010-04-03     17
6  2013-01-02      Samsung       Galaxy          2009-04-27     10
7  2013-01-02      Samsung   Galaxy Tab          2010-09-02      7

groupby を使用して、日付範囲の合計を取得します。

> grouped = df.groupby(['Manufacturer', 'Product Name', 'Product Launch Date']).sum()
                                               Sales
Manufacturer Product Name Product Launch Date       
Apple        iPad         2010-04-03              30
             iPod         2001-10-23              34
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

ここまでは順調ですね！

ここで最後にやりたいことは、各メーカーの製品を発売日で並べ替えることですが、メーカーの下に階層的にグループ化したままにします-これが私がやろうとしているすべてです:

                                               Sales
Manufacturer Product Name Product Launch Date       
Apple        iPod         2001-10-23              34
             iPad         2010-04-03              30
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

sortlevel() を試すと、以前の会社ごとの優れた階層が失われます。

> grouped.sortlevel('Product Launch Date')
                                               Sales
Manufacturer Product Name Product Launch Date       
Apple        iPod         2001-10-23              34
Samsung      Galaxy       2009-04-27              24
Apple        iPad         2010-04-03              30
Samsung      Galaxy Tab   2010-09-02              22

sort() と sort_index() は単に失敗します:

grouped.sort(['Manufacturer','Product Launch Date'])
KeyError: u'no item named Manufacturer'

grouped.sort_index(by=['Manufacturer','Product Launch Date'])
KeyError: u'no item named Manufacturer'

簡単な操作のようですが、よくわかりません。

私はこれに MultiIndex を使用することに縛られていませんが、それが groupby() が返すものであるため、それが私が取り組んできたことです。

ところで、最初の DataFrame を生成するコードは次のとおりです。

data = {
  'Date': ['2013-01-01', '2013-01-01', '2013-01-01', '2013-01-01', '2013-01-02', '2013-01-02', '2013-01-02', '2013-01-02'],
  'Manufacturer' : ['Apple', 'Apple', 'Samsung', 'Samsung', 'Apple', 'Apple', 'Samsung', 'Samsung',],
  'Product Name' : ['iPod', 'iPad', 'Galaxy', 'Galaxy Tab', 'iPod', 'iPad', 'Galaxy', 'Galaxy Tab'], 
  'Product Launch Date' : ['2001-10-23', '2010-04-03', '2009-04-27', '2010-09-02','2001-10-23', '2010-04-03', '2009-04-27', '2010-09-02'],
  'Sales' : [12, 13, 14, 15, 22, 17, 10, 7]
}
df = DataFrame(data, columns=['Date', 'Manufacturer', 'Product Name', 'Product Launch Date', 'Sales'])

score 10 · Accepted Answer

ハックは、レベルの順序を変更することです。

In [11]: g
Out[11]:
                                               Sales
Manufacturer Product Name Product Launch Date
Apple        iPad         2010-04-03              30
             iPod         2001-10-23              34
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

In [12]: g.index = g.index.swaplevel(1, 2)

（あなたが見つけたように）MultiIndexレベルを順番にソートするSortlevel：

In [13]: g = g.sortlevel()

そしてスワップバック：

In [14]: g.index = g.index.swaplevel(1, 2)

In [15]: g
Out[15]:
                                               Sales
Manufacturer Product Name Product Launch Date
Apple        iPod         2001-10-23              34
             iPad         2010-04-03              30
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

sortlevel は残りのラベルを順番に並べ替えるべきではないと私は考えているため、github の問題が発生します。:) 「ソートの必要性」に関するドキュメントノートに言及する価値はありますが。

注:swaplevel最初の groupby の順序を並べ替えることで、最初のグループを回避できます。

g = df.groupby(['Manufacturer', 'Product Launch Date', 'Product Name']).sum()

score 6 · Accepted Answer

このワンライナーは私のために働きます：

In [1]: grouped.sortlevel(["Manufacturer","Product Launch Date"], sort_remaining=False)

                                               Sales
Manufacturer Product Name Product Launch Date       
Apple        iPod         2001-10-23              34
             iPad         2010-04-03              30
Samsung      Galaxy       2009-04-27              24
             Galaxy Tab   2010-09-02              22

これも機能することに注意してください：

groups.sortlevel([0,2], sort_remaining=False)

これは、2 年以上前に最初に投稿したときは機能しませんでした。デフォルトでは、sortlevel がすべてのインデックスでソートされ、会社の階層が台無しになっていたからです。その動作を無効にするsort_remainingが昨年追加されました。参照用のコミットリンクは次のとおりです。

score 4 · Accepted Answer

MultiIndex を「インデックス列」(別名、レベル) で並べ替えるには、.sort_index()メソッドを使用してそのlevel引数を設定する必要があります。複数のレベルで並べ替える場合は、引数をレベル名のリストに順番に設定する必要があります。

これにより、必要な DataFrame が得られます。

df.groupby(['Manufacturer',
            'Product Name', 
            'Launch Date']
          ).sum().sort_index(level=['Manufacturer','Launch Date'])

score 0 · Accepted Answer

非常に深い MultiIndex 内で複数のスワップを回避したい場合は、これを試すこともできます。

レベル X によるスライス (リスト内包表記 + .loc + IndexSlice による)
目的のレベルをソートする (sortlevel(2))
レベル X インデックスのすべてのグループを連結する

ここにコードがあります：

import pandas as pd
idx = pd.IndexSlice
g = pd.concat([grouped.loc[idx[i,:,:],:].sortlevel(2) for i in grouped.index.levels[0]])
g

score 0 · Accepted Answer

インデックスの保存を気にしない場合 (私はしばしば任意の整数インデックスを好みます)、次のワンライナーを使用できます。

grouped.reset_index().sort(["Manufacturer","Product Launch Date"])

python - Pandas での複数インデックスの並べ替え

5 に答える 5

Related

Reference