私はCassandraから読んでいます
a = sc.cassandraTable("my_keyspace", "my_table").select("timestamp", "vaue")
そしてそれをデータフレームに変換したい:
a.toDF()
スキーマは正しく推測されます。
DataFrame[timestamp: timestamp, value: double]
しかし、データフレームを具体化すると、次のエラーが発生します。
Py4JJavaError: An error occurred while calling o89372.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 285.0 failed 4 times, most recent failure: Lost task 0.3 in stage 285.0 (TID 5243, kepler8.cern.ch): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
process()
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in toInternal
return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in <genexpr>
return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 435, in toInternal
return self.dataType.toInternal(obj)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 190, in toInternal
seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo
AttributeError: 'str' object has no attribute 'tzinfo'
string
に与えられたように聞こえpyspark.sql.types.TimestampType
ます。
これをさらにデバッグするにはどうすればよいですか?