python - Python で正規表現を使用して単語を一致させる

Question

私は PRAW を使用して、「たくさん」と言う人のコメント作成者を受け取り、そのユーザー名をリストに保存する reddit ボットを作成しています。正規表現と文字列を機能させる方法に問題があります。これが私のコードです。

#importing praw for reddit api and time to make intervals

import praw
import time
import re


username = "LewisTheRobot"
password = 



r = praw.Reddit(user_agent = "Counts people who say alot")

word_to_match = ['\balot\b']

storage = []

r.login(username, password)

def run_bot():
    subreddit = r.get_subreddit("test")
    print("Grabbing subreddit")
    comments = subreddit.get_comments(limit=200)
    print("Grabbing comments")
    for comment in comments:
        comment_text = comment.body.lower()
        isMatch = any(string in comment_text for string in word_to_match)
        if comment.id not in storage and isMatch:
            print("Match found! Storing username: " + str(comment.author) + " into list.")
            storage.append(comment.author)


    print("There are currently: " + str(len(storage)) + " people who use 'alot' instead of ' a lot'.")


while True:
    run_bot()
    time.sleep(5)

そのため、私が使用している正規表現は、文字列の一部として alot ではなく alot という単語を探します。例 ze alot . これを実行するたびに、私が行ったコメントが見つかりません。助言がありますか？

score 3 · Accepted Answer

REのものではなく、文字列操作でチェックしています

isMatch = any(string in comment_text for string in word_to_match)

ここでの最初のinものは、部分文字列をチェックします - RE とは何の関係もありません。

これをに変更

isMatch = any(re.search(string, comment_text) for string in word_to_match)

さらに、初期化にエラーがあります。

word_to_match = ['\balot\b']

'\b'コード付きの文字0x08(バックスペース) です。このようなトラップを回避するために、RE パターンには常に生の文字列構文を使用します。

word_to_match = [r'\balot\b']

バックスラッシュとの 2 つの文字があり、bRE はこれらを「単語境界」を意味すると解釈します。

他にもバグがあるかもしれませんが、質問ごとに 2 つ以上のバグを探さないようにしています...:-)

python - Python で正規表現を使用して単語を一致させる

1 に答える 1

Related

Reference