python - IRC ボット、禁止ワードのリストを作成？

Question

したがって、現在、Twitch チャンネルのこのボットに関する問題は、Authlist がリストとして脅かされているときに、1 つの文字列に複数の単語を含めることができないことです。

例: foo1、foo2、foo3、foo4 という単語を禁止したいのですが、それらをすべて 1 つの文字列にまとめたまま、ボットがその人を禁止できるようにするには、4 つすべてをチャットに入力する必要があります。 4つの言葉の1つ。

前もって感謝します！

import socket

authlist = "patyyebot patyye"
banword = "foo1 foo2 foo3 foo4"
server = "patyye.jtvirc.com"
name = "patyyebot"
port = 6667
channel = "#patyye"
password = "xx"
irc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
irc.connect((server, port))
irc.send("PASS " + password + "\n")
irc.send("NICK " + name + "\n")
irc.send("USER patyyebot patyyebot patyyebot :PatyYeBot\n")
irc.send("JOIN " + channel + "\n")
while True:

    def message(msg):
        irc.send("PRIVMSG " + channel + " :" + msg + "\n")
    def ban(msg):
        irc.send("PRIVMSG " + channel + " :/ban " + msg + "\n")


    data = irc.recv(1204)
    data = data.strip('\r\n')
    senderusr = data.split(" ")
    senderusr = senderusr[0]
    senderusr = senderusr.split("!")
    senderusr = senderusr[0]
    senderusr = senderusr.strip(":")

    print data
    if data.find == "PONG" :
        irc.send("PING")

    if "!facebook" in data and senderusr in authlist:
        message("@" + senderusr + ": Facebook is private")

    if "!twitter" in data:
        message("Follow PatyYe on Twitter: https://twitter.com/PatyYe")

    if data in banword:
        message("@" + senderusr + ":  zei een gebanned woord! Ban uitgevoerd")
        ban(senderusr)

score 2 · Accepted Answer

正規表現を使用すると、ループを回避し、1 回のパスですべての単語をチェックできます。

禁止された単語だけを検閲できます (会話をログに記録/アーカイブしている場合):

>>> banned_words = "phuck azz deeck peach"
>>> regexp = '|'.join(banned_words.split())
>>> message = "You son of a peach!"
>>> import re
>>> re.sub(regexp, '[beeeeeep]', message)
'You son of a [beeeeeep]!'

または、禁止されている単語をテストして、ユーザーを禁止することもできます。

>>> if re.search(regexp, message): print "Consider yourself banned, sir!"
... 
Consider yourself banned, sir!

[アップデート]

ジョン書きました:

おそらく、banned_words を長さの降順に並べ (最初に最も長い単語に一致させるため)、念のために re.escape を実行するのが最善です... – Jon Clements

リストのソースによっては、安全のために、正規表現にとって特別な意味を持つシーケンスをエスケープしたい場合があります。

>>> ordered_list = sorted(banned_words.split(), key=lambda x: len(x), reverse=True)
>>> ordered_list
['phuck', 'deeck', 'peach', 'azz']
>>> regexp = '|'.join([re.escape(word) for word in ordered_list])
>>> regexp
'phuck|deeck|peach|azz'

大文字と小文字を区別せず、単語の境界を一致させる (誤検知を防ぐ) ために、正規表現を拡張する必要がある場合があります。

\b(...)\b で正規表現をラップすることも良い考えです。これは、"弾劾" (または、より現実的には "スカンソープ") と言って誰かを誤って禁止しないようにするためです。– イルマリ・カロネン

バックスラッシュをエスケープする (または生の文字列を使用する) 必要があることに注意してください。

>>> regexp = r'\b(' + regexp + r')\b'
>>> regexp
'\\b(phuck|deeck|peach|azz)\\b'

score 1 · Accepted Answer

これを行う 1 つの方法はyourstring.split()、スペースで区切られた禁止単語の文字列をリストに分割するために使用することです。

>>> banned_string = "word1 word2 word3"
>>> banned_string.split()
['word1', 'word2', 'word3']

次に、単語を繰り返し処理して、メッセージ内でそれらを探すことができます。

完全な例:

def checkmessage(msg):
    banned_words = "badword1 badword2 badword3"
    banned_list= banned_words.split()

    for word in banned_list:
         if word in msg:
             print("banned for saying: " + word)
             return
    print("not banned")


msg1 = "Nothing special here"
msg2 = "I say the badword2."

checkmessage(msg1)
checkmessage(msg2)

そのプログラムを実行すると、次のようになります。

not banned
banned for saying: badword2

python - IRC ボット、禁止ワードのリストを作成？

2 に答える 2

Related

Reference