python - TXTファイルから句読点と大文字小文字を削除する

Question

Pythonに小さな問題があります。私はスクリプトを持っています：

import nltk
def analyzer():
    inputfile=raw_input("Which file?: ")
    review=open(inputfile,'r')
    review=review.read()
    tokens=review.split()

    for token in tokens:
        if token in string.punctuation:         
            tokens.remove(token)
        token=tokens.lower()

txtファイルをインポートし、単語に分割してから句読点を削除し、すべて小文字に変換することになっています。難しいことではないでしょ？句読点と大文字をそのままにして、そのまま戻ります。エラーメッセージはありません。コードの一部を無視しているようです。

どんな助けでも大歓迎です。

score 2 · Accepted Answer

stringモジュールがインポートされていると仮定しています。ラインを交換する

if token in string.punctuation:         
     tokens.remove(token)
     token=tokens.lower()

と

token = token.translate(None,string.punctuation).lower()

また、文字列は Python では不変であるため、文字列に割り当てると名前が再バインドされるだけで、元のトークンは変更されません。トークンを変更したい場合は、次のことができます

tokens = [token.translate(None,string.punctuation).lower() for token in tokens]

個人的には、次のように全体をクリーンアップします。

def read_tokens(path):
    import string
    with open(path) as f:
        tokens = f.read().split()
        return [ token.translate(None, string.punctuation).lower() for token in tokens ]

read_tokens(raw_input("which file?"))

これは、あなたの当初の意図を忠実に翻訳したものにすぎないことに注意して'test.me'ください['testme']。['test','me']

python - TXTファイルから句読点と大文字小文字を削除する

2 に答える 2

Related

Reference