python - 長文チェックインセンテンス

Question

長い単語があるかどうかを文章で確認したい。たとえば、soooo、toooo、thaaatttt などです。長い単語が含まれている場合と含まれていない場合がある文章のリストがあるため、ユーザーが何を入力するかわかりません。Pythonでそれを確認するにはどうすればよいですか。私はpythonが初めてです。

score 3 · Accepted Answer

これを試して：

import re
s1 = "This has no long words"
s2 = "This has oooone long word"

def has_long(sentence):
    elong = re.compile("([a-zA-Z])\\1{2,}")
    return bool(elong.search(sentence))


print has_long(s1)
False
print has_long(s2)
True

score 3 · Accepted Answer

@HughBothwell には良いアイデアがありました。私の知る限り、同じ文字が 3 回連続して繰り返される英単語はありません。したがって、これを行う単語を検索できます。

>>> from re import search
>>> mystr = "word word soooo word tooo thaaatttt word"
>>> [x for x in mystr.split() if search(r'(?i)[a-z]\1\1+', x)]
['soooo,', 'tooo', 'thaaatttt']
>>>

あなたが見つけたものは、長い言葉になります。

score 1 · Accepted Answer

さて、論理的に可能なすべての細長い単語のリストを作成できます。次に、文中の単語をループしてから、リスト内の単語をループして、長い単語を見つけます。

sentence = "Hoow arre you doing?"
elongated = ["hoow",'arre','youu','yoou','meee'] #You will need to have a much larger list
for word in sentence:
    word = word.lower()
    for e_word in elongated:
        if e_word == word:
            print "Found an elongated word!"

ヒュー・ボスウェルが言ったことを実行したい場合は、次のようにします。

sentence = "Hooow arrre you doooing?"
elongations = ["aaa","ooo","rrr","bbb","ccc"]#continue for all the letters 
for word in sentence:
    for x in elongations:
        if x in word.lower():
            print '"'+word+'" is an elongated word'

score 1 · Accepted Answer

利用可能な有効な英単語のリファレンスが必要です。*NIX システムでは、または同等のものを使用/etc/share/dict/wordsし/usr/share/dict/wordsて、すべての単語をsetオブジェクトに格納できます。

次に、文のすべての単語について、

単語自体が有効な単語ではないこと (つまり、word not in all_words)。と
つまり、連続するすべてのシーケンスを 1 文字または 2 文字に短縮すると、新しい単語は有効な単語になります。

すべての可能性を抽出しようとする 1 つの方法を次に示します。

import re
import itertools

regex = re.compile(r'\w\1\1')

all_words = set(get_all_words())

def without_elongations(word):
    while re.search(regex, word) is not None:
        replacing_with_one_letter = re.sub(regex, r'\1', word, 1)
        replacing_with_two_letters = re.sub(regex, r'\1\1', word, 1)
        return list(itertools.chain(
            without_elongations(replacing_with_one_letter),
            without_elongations(replacing_with_two_letters),
        ))

for word in sentence.split():
    if word not in all_words:
        if any(map(lambda w: w in all_words, without_elongations(word)):
            print('%(word) is elongated', { 'word': word })

python - 長文チェックインセンテンス

4 に答える 4

Related

Reference