python - 文字列内の 4 文字の単語を置き換える

Question

あるファイルから入力を読み取り、4 文字の単語すべてを「xxxx」に置き換え、それを別のファイルに書き込むコードを作成しようとしています。この問題はすでにサイトに掲載されていることを知っており、Google で他の問題を見つけましたが、それらはすべて同じです。私もコードをいじりましたが、それでも解決策に到達できませんでした。

def censor(filename):
    'string ==> None, creates file censored.txt in current folder with all 4 letter words replaces with string xxxx'
    import string
    infile = open(filename,'r')
    infile2 = open('censored.txt','w')
    for word in infile:
        words = word.split()
        for i, word in enumerate(words):
            words.strip(string.punctuation)
            if len(word) == 4:
                words[i] == 'xxxx'
                infile2.write(words[i])

これは機能しないコードの混乱であることはわかっていますが、何でも投稿する価値があると考えました。テキストから句読点を削除して、句読点のある 4 文字の単語を 5 としてカウントしないようにし、単語をリストに分割して 4 文字の単語を変更し、それらを元の順序で結合し直すというアイデアがありました。言葉を入れ替えただけ。だから「働くのが好き」です。「I xxxx to xxxx」になります。

また、このサイトの別の同様の投稿を見て、機能する解決策を見つけましたが、句読点の問題には対処していません.

def maybe_replace(word, length=4):
    if len(word) == length:
        return 'xxxx'
    else:
        return word

def replacement(filename):
    infile = open(filename,'r')
    outfile = open('censored.txt','w')
    for line in infile:
        words = line.split()
        newWords = [maybe_replace(word) for word in words]
        newLine = ' '.join(newWords)
        outfile.write(newLine + '\n')
    outfile.close()
    infile.close()

この場合、「カエル、ブーツ、猫、犬」のような単語のリストがあるとします。「カエル、ブーツ、xxxx xxxx」を返します

正規表現を使用した別の解決策も見つけましたが、私はまだ初心者であり、その解決策を本当に理解できません。どんな助けでも大歓迎です。

score 3 · Accepted Answer

正規表現のソリューションは非常に単純です。

import re

text = """
    I also found another solution using 
    regex, but I'm still a novice and 
    really can't understand that solution. 
    Any help would be appreciated.
"""

print re.sub(r'\b\w{4}\b', 'xxxx', text)

正規表現は次のように一致します。

\b、これは単語境界です。単語の先頭または末尾に一致します。
\w{4}4 つの単語文字 ( a-z、A-Z、0-9または_) に一致します。
\bはさらに別の単語境界です。

出力は次のとおりです。

I xxxx found another solution using 
regex, but I'm still a novice and 
really can't understand xxxx solution. 
Any xxxx would be appreciated.

score 1 · Accepted Answer

コードの2番目の部分には、に問題がありますwords = line.split()。デフォルトでは、スペースで分割されるため、「、」は単語の一部としてカウントされます。

本当に正規表現に触れたくない場合は、ここに私の提案があります（まだ少し正規表現が含まれています）：

import re
words = re.split('[\W]+', line)

これは、Pythonに英数字以外の文字で行を分割するように要求します。

score 0 · Accepted Answer

そこに私の答えがあります！:)

import string as s
alfanum = s.ascii_letters + s.digits

def maybe_replace(arg, length=4):
    word = ""
    for t in arg: word += t if t in alfanum else ""

    if len(word) == length: 
        if len(arg)>4: return 'xxxx'+arg[4:]
        else: return 'xxxx'
    else: 
      return arg

text = "Frog! boot, cat, dog. bye, bye!"
words = text.split()
print words
print [maybe_replace(word) for word in words]

>>> ['Frog!', 'boot,', 'cat,', 'dog.', 'bye,', 'bye!']
>>> ['xxxx!', 'xxxx,', 'cat,', 'dog.', 'bye,', 'bye!']

python - 文字列内の 4 文字の単語を置き換える

3 に答える 3

Related

Reference