3

段落があるとしましょう。これを sent_tokenize で文に分けます。

variable = ['By the 1870s the scientific community and much of the general public had accepted evolution as a fact.',
    'However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.',
    'Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.']

次に、各文を単語に分割し、変数に追加します。同じ単語の数が最も多い2つの文のグループを見つけるにはどうすればよいですか. これを行う方法がわかりません。10 文ある場合、(各文の間に) 90 回のチェックが行われます。ありがとうございます。

4

2 に答える 2

5

python setsの交差を使用できます。

そのような3つの文がある場合:

a = "a b c d"
b = "a c x y"
c = "a q v"

次のようにして、2 つの文に同じ単語がいくつあるかを確認できます。

sameWords = set.intersection(set(a.split(" ")), set(c.split(" ")))
numberOfWords = len(sameWords)

これにより、文のリストを繰り返し処理し、最も同じ単語を含む 2 つを見つけることができます。これにより、次のことがわかります。

sentences = ["a b c d", "a d e f", "c x y", "a b c d x"]

def similar(s1, s2):
    sameWords = set.intersection(set(s1.split(" ")), set(s2.split(" ")))
    return len(sameWords)

currentSimilar = 0
s1 = ""
s2 = ""

for sentence in sentences:
    for sentence2 in sentences:
        if sentence is sentence2:
            continue
        similiarity = similar(sentence, sentence2)
        if (similiarity > currentSimilar):
            s1 = sentence
            s2 = sentence2
            currentSimilar = similiarity

print(s1, s2)

パフォーマンスが問題である場合、この問題に対する動的プログラミングの解決策がいくつかある可能性があります。

于 2013-11-07T15:57:33.833 に答える
1
import itertools

sentences = ["There is no subtle meaning in this.", "Don't analyze this!", "What is this sentence?"]
decomposedsentences = ((index, set(sentence.strip(".?!,").split(" "))) for index, sentence in enumerate(sentences))
s1,s2 = max(itertools.combinations(decomposedsentences, 2), key = lambda sentences: len(sentences[0][1]&sentences[1][1]))
print("The two sentences with the most common words", sentences[s1[0]], sentences[s2[0]])
于 2013-11-07T16:21:08.367 に答える