1 つの方法を次に示します。
- エントリを並べ替える
- 各エントリ間の共通プレフィックスの長さを決定する
- 共通のプレフィックスが前のエントリよりも短いポイントでリストを区切って、エントリをグループ化します
実装例:
def common_count(t0, t1):
"returns the length of the longest common prefix"
for i, pair in enumerate(zip(t0, t1)):
if pair[0] != pair[1]:
return i
return i
def group_by_longest_prefix(iterable):
"given a sorted list of strings, group by longest common prefix"
longest = 0
out = []
for t in iterable:
if out: # if there are previous entries
# determine length of prefix in common with previous line
common = common_count(t, out[-1])
# if the current entry has a shorted prefix, output previous
# entries as a group then start a new group
if common < longest:
yield out
longest = 0
out = []
# otherwise, just update the target prefix length
else:
longest = common
# add the current entry to the group
out.append(t)
# return remaining entries as the last group
if out:
yield out
使用例:
text = """
TOKYO-BLING.1 H02-AVAILABLE
TOKYO-BLING.1 H02-MIDDLING
TOKYO-BLING.1 H02-TOP
TOKYO-BLING.2 H04-USED
TOKYO-BLING.2 H04-AVAILABLE
TOKYO-BLING.2 H04-CANCELLED
WAY-VERING.1 H03-TOP
WAY-VERING.2 H03-USED
WAY-VERING.2 H03-AVAILABLE
WAY-VERING.1 H03-CANCELLED
"""
T = sorted(t.strip() for t in text.split("\n") if t)
for L in group_by_longest_prefix(T):
print L
これにより、次が生成されます。
['TOKYO-BLING.1 H02-AVAILABLE', 'TOKYO-BLING.1 H02-MIDDLING', 'TOKYO-BLING.1 H02-TOP']
['TOKYO-BLING.2 H04-AVAILABLE', 'TOKYO-BLING.2 H04-CANCELLED', 'TOKYO-BLING.2 H04-USED']
['WAY-VERING.1 H03-CANCELLED', 'WAY-VERING.1 H03-TOP']
['WAY-VERING.2 H03-AVAILABLE', 'WAY-VERING.2 H03-USED']
実際の動作はこちら: http://ideone.com/1Da0S