python - sframe.apply() を使用するとランタイムエラーが発生する

Question

データでいっぱいの s フレームに単純な適用を使用しようとしています。これは、テキスト入力を受け取り、それをリストに分割する関数を適用する、列の 1 つでの単純なデータ変換用です。関数とその呼び出し/出力は次のとおりです。

    In [1]: def count_words(txt):
           count = Counter()
           for word in txt.split():
               count[word]+=1
           return count

    In [2]: products.apply(lambda x: count_words(x['review']))

    ---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-85338326302c> in <module>()
----> 1 products.apply(lambda x: count_words(x['review']))

C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\data_structures\sframe.pyc in apply(self, fn, dtype, seed)
   2607 
   2608         with cython_context():
-> 2609             return SArray(_proxy=self.__proxy__.transform(fn, dtype, seed))
   2610 
   2611     def flat_map(self, column_names, fn, column_types='auto', seed=None):

C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\cython\context.pyc in __exit__(self, exc_type, exc_value, traceback)
     47             if not self.show_cython_trace:
     48                 # To hide cython trace, we re-raise from here
---> 49                 raise exc_type(exc_value)
     50             else:
     51                 # To show the full trace, we do nothing and let exception propagate

RuntimeError: Runtime Exception. Unable to evaluate lambdas. Lambda workers did not start.

コードを実行すると、そのエラーが発生します。s フレーム (df) は 10 x 2 しかないため、そこから過負荷が発生することはありません。この問題を解決する方法がわかりません。

score 1 · Accepted Answer

GraphLab Create を使用している場合、実際にはこれを行うための組み込みツールが「テキスト分析」ツールキットにあります。次のようなデータがあるとします。

import graphlab
products = graphlab.SFrame({'review': ['a portrait of the artist as a young man',
                                       'the sound and the fury']})

各エントリの単語を数える最も簡単な方法は、

products['counts'] = graphlab.text_analytics.count_words(products['review'])

sframe パッケージを単独で使用している場合、または説明したようなカスタム関数を実行したい場合、コードに欠けている重要な部分は、カウンターを辞書に変換する必要があることだと思います。出力を処理する SFrame。

from collections import Counter

def count_words(txt):
    count = Counter()
    for word in txt.split():
        count[word] += 1
    return dict(count)

products['counts'] = products.apply(lambda x: count_words(x['review']))

python - sframe.apply() を使用するとランタイム エラーが発生する

2 に答える 2

Related

Reference

python - sframe.apply() を使用するとランタイムエラーが発生する