python - ggPlot ラインプロットで使用する Pandas DataFrame を作成する

Question

ggPlot で視覚化を作成できるように、Pandas dataFrame を作成しようとしています。しかし、DataFrame 構造をセットアップするのに苦労しています。

私のビジュアライゼーションは、(年対合計) の折れ線グラフになります。折れ線グラフは、何年にもわたって複数の「死因」を追跡します。

年ごとにグループ化された CSV ファイルをインポートしてから、'cause_of_death' をインポートしてカウントを行いました。しかし、これは DataFrame ではないため、折れ線グラフを作成するのに適切な形式ではありません。

以下は私のコードです。どんな提案も役に立ちます、ありがとう。

CSV ファイルから取得したいフィールドは「deathYear」と「cause_of_death」です

from pandas import * 
from ggplot import *

df = pandas.read_csv('query_result.csv')

newDF = df.loc[:,['date_of_death_year','acme_underlying_cause_code']]
data = DataFrame(newDF.groupby(['date_of_death_year','acme_underlying_cause_code']).size())

print data

score 1 · Accepted Answer

これは非常に古い質問ですが、解決するのは非常に簡単です。(ヒントggplot。pandas_ _

コードをレンダリングする方法は次のとおりです。

import numpy as np   # |Don't import * from these
import pandas as pd  # |
from ggplot import * # But this is customary because it's like R

# All this bit is just to make a DataFrame
# You can ignore it all
causes = ['foo', 'bar', 'baz']
years = [2001, 2002, 2003, 2004]
size = 100
data = {'causes':np.random.choice(causes, size),
        'years':np.random.choice(years, size),
        'something_else':np.random.random(size)
        }
df = pd.DataFrame(data)

# Here's where the good stuff happens. You're importing from
# a CSV so you can just start here
counts = df.groupby(['years', 'causes'])['something_else'].count()
counts = counts.reset_index() # Because ggplot doesn't plot with indexes
g = ggplot(counts, aes(x='years', y='something_else', color='causes')) +\
        geom_line()
print(g)

結果は次のとおりです。 ggplot複数行プロット

python - ggPlot ライン プロットで使用する Pandas DataFrame を作成する

1 に答える 1

Related

Reference

python - ggPlot ラインプロットで使用する Pandas DataFrame を作成する