python - 正規表現文字列を引用符と href で一致させる

Question

正規表現を使用して一致させようとしています

  <a href = "something" >

以下の文字列に含まれていますが、 None が出力されます。

E = '<a> test <a href> <a href = "something" ><a href="anything">'
H = re.match('^[<a href = ]\".\" >$' , E)
print (H)

score 1 · Accepted Answer

正規表現を使用してhtmlを解析しないでください。

BeautifulSoupを使用した例を次に示します。

from BeautifulSoup import BeautifulSoup, SoupStrainer


html_string = '<a> test <a href> <a href = "something" ><a href="anything">'
for link in BeautifulSoup(html_string, parseOnlyThese=SoupStrainer('a')):
    print link.get('href')

score 0 · Accepted Answer

HTML の解析に regex を使用しないことをお勧めします (そのためBeautifulSoup)
。

>>> regex = re.compile("(<\s*a\s*href\s*=\s*\"something\"\s*>)+")
# Run findall
>>> regex.findall(string)
[u'<a href = "something" >'] # your tag

python - 正規表現文字列を引用符と href で一致させる

2 に答える 2

Related

Reference