python - 「ascii」コーデックは、位置 25 の文字「\xeb」をエンコードできません: 序数が urlopen(req).read() の範囲 (128) にありません

Question

ニュース記事の画像リンクを自動的に取得しようとしています。ニュース記事の画像リンクを識別するgetimage関数を使用して、python モジュールimageprocessorを作成しました。

req = Request('http://top-channel.tv/artikull.php?id=264806&ref=fp', headers={'User-Agent': 'Mozilla/5.0'})
c = urlopen(req).read()
soup=BeautifulSoup(c)
m = soup.find('link',{'rel' : 'image_src'})
return m['href']

シェルから実行すると、正常に動作します。

import imageprocessor
img=imageprocessor.getimage('http://top-channel.tv/artikull.php?id=264806&ref=fp','Top Channel')
img
'http://www.top-channel.tv/foto/lajme/ELBASA-NDERTIMET-07_17.jpg'

問題は、views.py モジュール (Django フレームワーク) から同じ方法でこの関数を呼び出そうとすると、ブラウザーに次のエラーメッセージが表示されることです。

UnicodeEncodeError at /fillimi/

'ascii' codec can't encode character '\xeb' in position 25: ordinal not in range(128)

c = urlopen(req).read() は asci エンコードされた文字列を返すようです。私は試した：

img=img.encode('utf-8')

しかし、それは役に立ちませんでした。

score 0 · Accepted Answer

最初に文字列をデコードする必要があるようです。これを試して：

img = urllib.urlopen(link).read()
img = img.decode(<source encoding>)
img = unicode_str.encode("utf8")

例は次のとおりです。

img= '\xa0'
img = img.decode("windows-1252")
img = img.encode("utf8")

python - 「ascii」コーデックは、位置 25 の文字「\xeb」をエンコードできません: 序数が urlopen(req).read() の範囲 (128) にありません

1 に答える 1

Related

Reference