python-2.7 - Pythonとreを使用してreddit.comのすべての画像リンクを取得しようとしています

Question

私は他の投稿に目を通し、彼らが言ったことを私のコードに実装しようとしましたが、まだ何かが欠けています.

私がやろうとしているのは、Web サイト、特に reddit.com からすべての画像リンクを取得し、リンクを取得してブラウザーで画像を表示するか、ダウンロードして Windows Image Viewer で表示することです。私は自分のpythonスキルを練習して広げようとしています。

リンクの取得と画像の表示方法の選択に行き詰まっています。私が今持っているものは次のとおりです。

import urllib2
import re
links=urllib2.urlopen("http://www.reddit.com").read()
found=re.findall("http://imgur.com/+\w+.jpg", links)
print found #Just for testing purposes, to see what links are found

助けてくれてありがとう。

score 3 · Accepted Answer

imgur.comredditのリンクには拡張子がない.jpgため、正規表現は何にも一致しません。i.imgur.com代わりにドメインを探す必要があります。

マッチングre.findall("http://i.imgur.com/\w+.jpg", links)は結果を返します:

>>> re.findall("http://i.imgur.com/\w+.jpg", links)
['http://i.imgur.com/PMNZ2.jpg', 'http://i.imgur.com/akg4l.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/z2wIl.jpg', 'http://i.imgur.com/z2wIl.jpg']

これを他のファイル拡張子に拡張できます。

>>> re.findall("http://i.imgur.com/\w+.(?:jpg|gif|png)", links)
['http://i.imgur.com/PMNZ2.jpg', 'http://i.imgur.com/akg4l.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/rsIfN.png', 'http://i.imgur.com/rsIfN.png', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/bPs5N.gif', 'http://i.imgur.com/z2wIl.jpg', 'http://i.imgur.com/z2wIl.jpg']

正規表現の代わりに適切な HTML パーサーを使用したい場合があります。BeautifulSoupとの両方をお勧めしlxmlます。およびファイルなど、これらのツールとのリンクを<img />使用するすべてのタグを簡単に見つけることができます。i.imgur.com.gif.png

python-2.7 - Pythonとreを使用してreddit.comのすべての画像リンクを取得しようとしています

1 に答える 1

Related

Reference