algorithm - 動的計画法によるテキスト位置合わせの実装

Question

MIT OCW のコースを通じて、動的プログラミングの概念を理解しようとしています。OCWの動画での説明は素晴らしいのですが、説明をコードに落とし込むまではよくわからない気がします。実装中、ここの講義ノート、特にノートの 3 ページからいくつかのメモを参照します。

問題は、数学表記の一部をコードに変換する方法がわからないことです。これが私が実装したソリューションの一部です（そしてそれが正しく実装されていると思います）：

import math

paragraph = "Some long lorem ipsum text."
words = paragraph.split(" ")

# Count total length for all strings in a list of strings.
# This function will be used by the badness function below.
def total_length(str_arr):
    total = 0

    for string in str_arr:
        total = total + len(string)

    total = total + len(str_arr) # spaces
    return total

# Calculate the badness score for a word.
# str_arr is assumed be send as word[i:j] as in the notes
# we don't make i and j as argument since it will require
# global vars then.
def badness(str_arr, page_width):
    line_len = total_length(str_arr)
    if line_len > page_width:
        return float('nan') 
    else:
        return math.pow(page_width - line_len, 3)

今、私が理解していないのは、講義ノートのポイント 3 から 5 です。私は文字通り理解しておらず、それらの実装をどこから始めればよいかわかりません。これまでのところ、次のように、単語のリストを繰り返し、行末と思われる各行の悪さを数えてみました。

def justifier(str_arr, page_width):
    paragraph = str_arr
    par_len = len(paragraph)
    result = [] # stores each line as list of strings
    for i in range(0, par_len):
        if i == (par_len - 1):
            result.append(paragraph)
        else:
            dag = [badness(paragraph[i:j], page_width) + justifier(paragraph[j:], page_width) for j in range(i + 1, par_len + 1)] 
            # Should I do a min(dag), get the index, and declares it as end of line?

しかし、どうすれば機能を継続できるのかわかりません。正直に言うと、次の行がわかりません。

dag = [badness(paragraph[i:j], page_width) + justifier(paragraph[j:], page_width) for j in range(i + 1, par_len + 1)]

そして、どのように(justifierとして戻りますか?int戻り値をリストであるに格納することを既に決めているためresultです。別の関数を作成して、そこから再帰する必要がありますか? 再帰が必要ですか?

次に何をすべきかを示し、これが動的計画法である理由を説明していただけますか? 再帰がどこにあり、副問題が何であるかは本当にわかりません。

前にありがとう。

score 23 · Accepted Answer

動的計画法自体の核となるアイデアを理解するのに問題がある場合は、ここに私の見解を示します。

動的プログラミングは本質的に、時間の複雑さのために空間の複雑さを犠牲にしています(ただし、通常、使用する余分な空間は、節約できる時間に比べてごくわずかであるため、動的プログラミングを正しく実装すれば、それだけの価値があります)。各再帰呼び出しからの値を (配列や辞書などに) 保存するので、再帰ツリーの別のブランチで同じ再帰呼び出しに遭遇したときに 2 回目の計算を避けることができます。

いいえ、再帰を使用する必要はありません。これは、ループだけを使用して取り組んでいた質問の私の実装です。私は AlexSilva によってリンクされた TextAlignment.pdf を非常に厳密にたどりました。うまくいけば、これがお役に立てば幸いです。

def length(wordLengths, i, j):
    return sum(wordLengths[i- 1:j]) + j - i + 1


def breakLine(text, L):
    # wl = lengths of words
    wl = [len(word) for word in text.split()]

    # n = number of words in the text
    n = len(wl)    

    # total badness of a text l1 ... li
    m = dict()
    # initialization
    m[0] = 0    

    # auxiliary array
    s = dict()

    # the actual algorithm
    for i in range(1, n + 1):
        sums = dict()
        k = i
        while (length(wl, k, i) <= L and k > 0):
            sums[(L - length(wl, k, i))**3 + m[k - 1]] = k
            k -= 1
        m[i] = min(sums)
        s[i] = sums[min(sums)]

    # actually do the splitting by working backwords
    line = 1
    while n > 1:
        print("line " + str(line) + ": " + str(s[n]) + "->" + str(n))
        n = s[n] - 1
        line += 1

score 16 · Accepted Answer

まだこれに興味がある人のために: 重要なのは、テキストの最後から後方に移動することです (ここで述べたように)。そうすれば、すでに記憶されている要素を比較するだけです。

は、wordsに従ってラップされる文字列のリストですtextwidth。次に、講義の表記では、タスクは 3 行のコードに短縮されます。

import numpy as np

textwidth = 80

DP = [0]*(len(words)+1)

for i in range(len(words)-1,-1,-1):
    DP[i] = np.min([DP[j] + badness(words[i:j],textwidth) for j in range(i+1,len(words)+1)])

と：

def badness(line,textwidth):

    # Number of gaps
    length_line = len(line) - 1

    for word in line:
        length_line += len(word)

    if length_line > textwidth: return float('inf')

    return ( textwidth - length_line )**3

彼は、壊れた位置を追跡するために 2 番目のリストを追加できると述べています。これを行うには、コードを次のように変更します。

DP = [0]*(len(words)+1)
breaks = [0]*(len(words)+1)

for i in range(len(words)-1,-1,-1):
    temp = [DP[j] + badness(words[i:j],args.textwidth) for j in range(i+1,len(words)+1)]

    index = np.argmin(temp)

    # Index plus position in upper list
    breaks[i] = index + i + 1
    DP[i] = temp[index]

テキストを復元するには、改行位置のリストを使用するだけです。

def reconstruct_text(words,breaks):                                                                                                                

    lines = []
    linebreaks = []

    i = 0 
    while True:

        linebreaks.append(breaks[i])
        i = breaks[i]

        if i == len(words):
            linebreaks.append(0)
            break

    for i in range( len(linebreaks) ):
        lines.append( ' '.join( words[ linebreaks[i-1] : linebreaks[i] ] ).strip() )

    return lines

結果: ( text = reconstruct_text(words,breaks))

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

いくつかの空白を追加したくなるかもしれません。これはかなりトリッキーですが (さまざまな美的ルールを思い付く可能性があるため)、素朴な試みとしては次のようになります。

import re

def spacing(text,textwidth,maxspace=4):

    for i in range(len(text)):

        length_line = len(text[i])

        if length_line < textwidth:

            status_length = length_line
            whitespaces_remain = textwidth - status_length
            Nwhitespaces = text[i].count(' ')

            # If whitespaces (to add) per whitespace exeeds
            # maxspace, don't do anything.
            if whitespaces_remain/Nwhitespaces > maxspace-1:pass
            else:
                text[i] = text[i].replace(' ',' '*( 1 + int(whitespaces_remain/Nwhitespaces)) )
                status_length = len(text[i])

                # Periods have highest priority for whitespace insertion
                periods = text[i].split('.')

                # Can we add a whitespace behind each period?
                if len(periods) - 1 + status_length <= textwidth:
                    text[i] = '. '.join(periods).strip()

                status_length = len(text[i])
                whitespaces_remain = textwidth - status_length
                Nwords = len(text[i].split())
                Ngaps = Nwords - 1

                if whitespaces_remain != 0:factor = Ngaps / whitespaces_remain

                # List of whitespaces in line i
                gaps = re.findall('\s+', text[i])

                temp = text[i].split()
                for k in range(Ngaps):
                    temp[k] = ''.join([temp[k],gaps[k]])

                for j in range(whitespaces_remain):
                    if status_length >= textwidth:pass
                    else:
                        replace = temp[int(factor*j)]
                        replace = ''.join([replace, " "])
                        temp[int(factor*j)] = replace

                text[i] = ''.join(temp)

    return text

何を与えるか: ( text = spacing(text,textwidth))

Lorem  ipsum  dolor  sit  amet, consetetur  sadipscing  elitr,  sed  diam nonumy
eirmod  tempor  invidunt  ut labore  et  dolore  magna aliquyam  erat,  sed diam
voluptua.   At  vero eos  et accusam  et justo  duo dolores  et ea  rebum.  Stet
clita  kasd  gubergren,  no  sea  takimata sanctus  est  Lorem  ipsum  dolor sit
amet.   Lorem  ipsum  dolor  sit amet,  consetetur  sadipscing  elitr,  sed diam
nonumy  eirmod  tempor invidunt  ut labore  et dolore  magna aliquyam  erat, sed
diam  voluptua.  At vero eos et accusam et  justo duo dolores et ea rebum.  Stet
clita  kasd gubergren, no sea  takimata sanctus est Lorem  ipsum dolor sit amet.

score 1 · Accepted Answer

私は講義を見て、私が理解できるものは何でもここに置くだろうと思った. 質問者様と同じ形式でコードを入れておきました。講義で説明したように、ここでは再帰を使用しました。
ポイント 3、再発を定義します。これは基本的に、より高い入力に関連する関数の値を先に計算し、それを使用してより低い値の入力のを計算する、ボトムアプローチです。
講義では、次のように説明されています
。DP(i) = min(DP(j) + badness(i, j))
for j for i+1 to n.
ここで、i は n から 0 まで変化します (下から上へ!)。
DP(n) = 0 として、
DP(n-1) = DP(n) + badness(n-1, n)
そして、D(n-1) と D(n) から D(n-2) を計算しますそしてそれらを最小限に抑えます。
この方法で i=0 まで下げることができ、それが悪の最終的な答えです!
ポイント 4 では、ご覧のとおり、ここで 2 つのループが進行しています。1 つは i 用で、もう 1 つは i 用です。
したがって、i=0 の場合、j(max) = n、i = 1、j(max) = n-1、... i = n 、j(max) = 0.
したがって、合計時間 = これらの加算 = n (n+1)/2。
したがって、O(n^2) です。
ポイント 5 は、DP[0]! のソリューションを特定するだけです。
お役に立てれば！

import math

justification_map = {}
min_map = {}

def total_length(str_arr):
    total = 0

    for string in str_arr:
        total = total + len(string)

    total = total + len(str_arr) - 1 # spaces
    return total

def badness(str_arr, page_width):
    line_len = total_length(str_arr)
    if line_len > page_width:
        return float('nan') 
    else:
        return math.pow(page_width - line_len, 3)

def justify(i, n, words, page_width):
    if i == n:

        return 0
    ans = []
    for j in range(i+1, n+1):
        #ans.append(justify(j, n, words, page_width)+ badness(words[i:j], page_width))
        ans.append(justification_map[j]+ badness(words[i:j], page_width))
    min_map[i] = ans.index(min(ans)) + 1
    return min(ans)

def main():
    print "Enter page width"
    page_width = input()
    print "Enter text"
    paragraph = input() 
    words = paragraph.split(' ')
    n = len(words)
    #justification_map[n] = 0 
    for i in reversed(range(n+1)):
        justification_map[i] = justify(i, n, words, page_width)

    print "Minimum badness achieved: ", justification_map[0]

    key = 0
    while(key <n):
        key = key + min_map[key]
        print key

if __name__ == '__main__':
    main()

score 0 · Accepted Answer

これはあなたの定義によると私が考えるものです。

import math

class Text(object):
    def __init__(self, words, width):
        self.words = words
        self.page_width = width
        self.str_arr = words
        self.memo = {}

    def total_length(self, str):
        total = 0
        for string in str:
            total = total + len(string)
        total = total + len(str) # spaces
        return total

    def badness(self, str):
        line_len = self.total_length(str)
        if line_len > self.page_width:
            return float('nan') 
        else:
            return math.pow(self.page_width - line_len, 3)

    def dp(self):
        n = len(self.str_arr)
        self.memo[n-1] = 0

        return self.judge(0)

    def judge(self, i):
        if i in self.memo:
            return self.memo[i]

        self.memo[i] = float('inf') 
        for j in range(i+1, len(self.str_arr)):
            bad = self.judge(j) + self.badness(self.str_arr[i:j])
            if bad < self.memo[i]:
                self.memo[i] = bad

        return self.memo[i]

algorithm - 動的計画法によるテキスト位置合わせの実装

5 に答える 5

Related

Reference