python - 行を段落に分割する

Question

入力: 行のリスト

出力: 行のリストのリスト。これは (1 つまたは複数のシーケンス) 空行で分割された入力リストです。

これは私がこれまでに持っていた最も醜い解決策です:

split_at_empty(lines):
    paragraphs = []
    p = []
    def flush():
        if p:
            paragraphs.append(p)
        p = []
    for l in lines:
        if l:
            p.append(l)
        else:
            flush()
    flush()
    return paragraphs

より良い解決策があるはずです (おそらく機能的です)! 誰？

サンプル入力リスト:

['','2','3','','5','6','7','8','','','11']

出力：

[['2','3'],['5','6','7','8'],['11']]

score 2 · Accepted Answer

import re

ss =  '''Princess Maria Amelia of Brazil (1831–1853)


was the daughter of Dom Pedro I,
founder of Brazil's independence and its first emperor,

and Amelie of Leuchtenberg.



The only child from her father's second marriage,
Maria Amelia was born in France
following Pedro I's 1831 abdication in favor of his son Dom Pedro II.

Before Maria Amelia was a month old, Pedro I left for Portugal
to restore its crown to his eldest daughter Dona Maria II.
He defeated his brother Miguel I (who had usurped Maria II's throne),
only to die a few months later of tuberculosis.


'''

def select_lines(input,regx = re.compile('((?:^.+\n)+)',re.MULTILINE)):
    return [x.splitlines() for x in regx.findall(input)]

for sl in  select_lines(ss):
    print sl
    print

結果

['Princess Maria Amelia of Brazil (1831\x961853)']

['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]

['and Amelie of Leuchtenberg.']

["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]

['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']

[['2', '3'], ['5', '6', '7', '8'], ['11']]

リストを操作する別の方法:

li = [ '', '2', '3', '', '5', '6', '7', '8', '', '', '11']

lo = ['5055','','','2','54','87','','1','2','5','8','','']

lu = ['AAAAA','BB','','HU','JU','GU']

def selines(L):
    ye = []
    for x in L:
        if x:
            ye.append(x)
        elif ye:
            yield ye ; ye = []
    if ye:
        yield ye



for lx in (li,lo,lu):
    print lx
    print list(selines(lx))
    print

結果

['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]

['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]

['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]

score 2 · Accepted Answer

オリジナルよりも少し見苦しくない:

def split_at_empty(lines):
    r = [[]]
    for l in lines:
        if l:
            r[-1].append(l)
        else:
            r.append([])
    return [l for l in r if l]

(最後の行は、そうでなければ追加される空のリストを取り除きます。)

score 1 · Accepted Answer

そして、リスト理解強迫者のために...

def split_at_empty(L):
    return [L[start:end+1] for start, end in zip(
        [n for n in xrange(len(L)) if L[n] and (n == 0 or not L[n-1])],
        [n for n in xrange(len(L)) if L[n] and (n+1 == len(L) or not L[n+1])]
        )]

またはそれ以上

def split_at_empty(lines):
    L = [i for i, a in enumerate(lines) if not a]
    return [lines[s + 1:e] for s, e in zip([-1] + L, L + [len(lines)]) 
            if e > s + 1]

score 0 · Accepted Answer

ジェネレーターベースのソリューションは次のとおりです。

def split_at_empty(lines):
   sep = [0] + [i for (i,l) in enumerate(lines) if not l] + [len(lines)]
   for start, end in zip(sep[:-1], sep[1:]):
      if start + 1 < end:
         yield lines[start+1:end]

あなたの入力のために：

l = ['' , '2' , '3' , '' , '5' , '6' , '7' , '8' , '' , '' , '11']
for para in split_at_empty(l):
   print para

それはもたらす

['2', '3']
['5', '6', '7', '8']
['11']

python - 行を段落に分割する

5 に答える 5

Related

Reference