5

この新しい方法を書き直して、Pythonで機能するアドレスを認識するにはどうすればよいですか?

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

4

3 に答える 3

12

The original source for that states "This pattern should work in most modern regex implementations" and specifically Perl. Python's regex implementation is modern and similar to Perl's but is missing the [:punct:] character class. You can easily build that using this:

>>> import string, re
>>> pat = r'\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^%s\s]|/)))'
>>> pat = pat % re.sub(r'([-\\\]])', r'\\\1', string.punctuation)

The re.sub() call escapes certain characters inside the character set as required.

Edit: Using re.escape() works just as well, since it just sticks a backslash in front of everything. That felt crude to me at first, but certainly works fine for this case.

>>> pat = pat % re.escape(string.punctuation)
于 2009-12-31T16:55:42.523 に答える
5

Pythonにはこの表現はないと思います

[:punct:]

ウィキペディア[:punct:]と同じだと言います

[-!\"#$%&\'()*+,./:;<=>?@\\[\\\\]^_`{|}~]
于 2009-12-31T16:48:20.117 に答える
2

PythonにはPOSIXブラケット式がありません。

[:punct:]角かっこ式は、ASCIIでは次の式と同等です 。

[!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~] 
于 2009-12-31T16:52:43.867 に答える