python - パンダの重複する datetimeindex エントリにより、奇妙な例外が発生する

Question

次の不自然な例を見てみましょう。ここでは、を作成してDataFrameから、DatetimeIndexエントリが重複する列を使用してを作成します。次に、これDataFrameを aPanelに配置し、長軸を反復しようとします。

import pandas as pd
import datetime as dt

a = [1371215933513120, 1371215933513121, 1371215933513122, 1371215933513122]
b = [1,2,3,4]
df = pd.DataFrame({'a':a, 'b':b, 'c':[dt.datetime.fromtimestamp(t/1000000.) for t in a]})
df.index=pd.DatetimeIndex(df['c'])

d = OrderedDict()
d['x'] = df
p = pd.Panel(d)

for y in p.major_axis:
    print y
    print p.major_xs(y)

これにより、次の出力が得られます。

2013-06-14 15:18:53.513120
                            x
a            1371215933513120
b                           1
c  2013-06-14 15:18:53.513120
2013-06-14 15:18:53.513121
                            x
a            1371215933513121
b                           2
c  2013-06-14 15:18:53.513121
2013-06-14 15:18:53.513122

やや不可解な（私にとって）エラーが続きます：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-045aaae5a074> in <module>()
     13 for y in p.major_axis:
     14     print y
---> 15     print p.major_xs(y)

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in __str__(self)
    667         if py3compat.PY3:
    668             return self.__unicode__()
--> 669         return self.__bytes__()
    670 
    671     def __bytes__(self):

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in __bytes__(self)
    677         """
    678         encoding = com.get_option("display.encoding")
--> 679         return self.__unicode__().encode(encoding, 'replace')
    680 
    681     def __unicode__(self):

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in __unicode__(self)
    692             # This needs to compute the entire repr
    693             # so don't do it unless rownum is bounded
--> 694             fits_horizontal = self._repr_fits_horizontal_()
    695 
    696         if fits_vertical and fits_horizontal:

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in _repr_fits_horizontal_(self)
    652             d=d.iloc[:min(max_rows, height,len(d))]
    653 
--> 654         d.to_string(buf=buf)
    655         value = buf.getvalue()
    656         repr_width = max([len(l) for l in value.split('\n')])

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py in to_string(self, buf, columns, col_space, colSpace, header, index, na_rep, formatters, float_format, sparsify, nanRep, index_names, justify, force_unicode, line_width)
   1489                                            header=header, index=index,
   1490                                            line_width=line_width)
-> 1491         formatter.to_string()
   1492 
   1493         if buf is None:

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in to_string(self, force_unicode)
    312             text = info_line
    313         else:
--> 314             strcols = self._to_str_columns()
    315             if self.line_width is None:
    316                 text = adjoin(1, *strcols)

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in _to_str_columns(self)
    265         for i, c in enumerate(self.columns):
    266             if self.header:
--> 267                 fmt_values = self._format_col(i)
    268                 cheader = str_columns[i]
    269 

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in _format_col(self, i)
    403                             float_format=self.float_format,
    404                             na_rep=self.na_rep,
--> 405                             space=self.col_space)
    406 
    407     def to_html(self, classes=None):

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify)
   1319                         justify=justify)
   1320 
-> 1321     return fmt_obj.get_result()
   1322 
   1323 

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in get_result(self)
   1335 
   1336     def get_result(self):
-> 1337         fmt_values = self._format_strings()
   1338         return _make_fixed_width(fmt_values, self.justify)
   1339 

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py in _format_strings(self)
   1362 
   1363         print "vals:", vals
-> 1364         is_float = lib.map_infer(vals, com.is_float) & notnull(vals)
   1365         leading_space = is_float.any()
   1366 

ValueError: operands could not be broadcast together with shapes (2) (2,3)

重複するエントリを含むインデックスを作成していることを説明したので、エラーの原因は明らかです。しかし、それを知らなければ、この Exception がポップアップする理由を理解するのは (やはり、私のような初心者にとっては) もっと困難だったでしょう。

これは私にいくつかの質問を導きます。

これは本当にパンダの予想される動作ですか? 重複するエントリを含むインデックスを作成することは禁止されていますか?それとも、それらを繰り返し処理することは禁止されていますか?
そのようなインデックスの作成が禁止されている場合、最初に作成するときに例外を発生させるべきではありませんか?
繰り返しが何らかの形で間違っている場合、エラーはより有益ではないでしょうか?
私は何か間違ったことをしていますか？

python - パンダの重複する datetimeindex エントリにより、奇妙な例外が発生する

0 に答える 0

Related

Reference