python - Pythonでdefaultdict（list）を反復処理するにはどうすればよいですか？

Question

Pythonでdefaultdict（list）を反復処理するにはどうすればよいですか？Pythonでリストの辞書を作成するより良い方法はありますか？通常の方法を試しiter(dict)ましたが、エラーが発生しました。

>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "para.py", line 31, in print_doc
    for para in iter(doc):
TypeError: iteration over non-sequence

メインクラス：

import para
para.print_doc('./foo/bar/para-lines.txt')

para.pyc：

# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    # Separator here refers to the paragraph seperator,
    #  the default separator is '\n'.
    def __init__(self, filename, separator=None):
        # Set separator if passed into object's parameter,
        #  else set default separator as '\n'
        if separator is None:
            def separator(line): return line == '\n'
        elif not callable(separator):
            raise TypeError, "separator argument must be callable"
        self.separator = separator
        # Reading lines from files into a dictionary of lists
        self.doc = defaultdict(list)
        paraIndex = 0
        with open(filename) as readFile:
            for line in readFile:
                if line == separator:
                    paraIndex+=1
                else:
                    self.doc[paraIndex].append(line)

# Prints out populated doc from txtfile
def print_doc(filename):
    text = Paragraphs(filename)
    for para in iter(text.doc):
        for sent in text.doc[para]:
            print "Para#%d, Sent#%d: %s" % (
                para, text.doc[para].index(sent), sent)

の例は./foo/bar/para-lines.txt次のようになります。

This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.

This is the start of next para.
foo boo bar bar
this is the end.

メインクラスの出力は次のようになります。

Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.

Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.

score 4 · Accepted Answer

あなたがラインで持っている問題

for para in iter(doc):

これはdoc段落のインスタンスであり、ではありませんdefaultdict。メソッドで使用するデフォルトのdict__init__はスコープ外になり、失われます。したがって、2つのことを行う必要があります。

docメソッドで作成された__init__ものをインスタンス変数として保存します（self.docたとえば）。
Paragraphs（メソッドを追加して）自身を反復可能にするか__iter__、作成されたオブジェクトへのアクセスを許可しdocます。

score 2 · Accepted Answer

リンクしたレシピはかなり古いです。これは、Pythonがitertools.groupby（Python2.4で導入され、2003年後半にリリースされた）のような最新のツールを使用する前の2001年に作成されました。コードを使用すると、次のようになりますgroupby。

import itertools
import sys

with open('para-lines.txt', 'r') as f:
    paranum = 0
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
        if is_separator:
            # we've reached paragraph separator
            print
        else:
            paranum += 1
            for n, sentence in enumerate(paragraph, start = 1):
                sys.stdout.write(
                    'Para#{i:d},Sent#{n:d}: {s}'.format(
                        i = paranum, n = n, s = sentence))

score 0 · Accepted Answer

Paragraphs問題は、辞書ではなく、クラスを反復処理していることのようです。また、キーを繰り返し処理してから辞書エントリにアクセスする代わりに、次の使用を検討してください。

for (key, value) in d.items():

score 0 · Accepted Answer

Paragraphsクラスで定義していないために失敗し__iter__()、呼び出しを試みますiter(doc)（docはParagraphsインスタンスです）。

__iter__()イテレータであるためには、クラスはイテレータを返すものを持っている必要があります。ここにドキュメントがあります。

score 0 · Accepted Answer

defaultdictは言うまでもなく、ここでdictを使用している理由は考えられません。リストのリストははるかに簡単です。

doc = []
with open(filename) as readFile:
    para = []
    for line in readFile:
        if line == separator:
            doc.append(para)
            para = []
        else:
            para.append(line)
    doc.append(para)

python - Pythonでdefaultdict（list）を反復処理するにはどうすればよいですか？

5 に答える 5

Related

Reference