python - urllib2 ファイル名

Question

urllib2 を使用してファイルを開くと、次のようになります。

remotefile = urllib2.urlopen('http://example.com/somefile.zip')

元の URL を解析する以外にファイル名を取得する簡単な方法はありますか?

編集: openfile を urlopen に変更しました...それがどのように起こったのかわかりません。

EDIT2:私は使用してしまった:

filename = url.split('/')[-1].split('#')[0].split('?')[0]

私が間違っていない限り、これにより、潜在的なクエリもすべて取り除かれます。

score 49 · Accepted Answer

urllib2.urlopen のことですか?

サーバーがをチェックして Content-Disposition ヘッダーを送信していた場合、意図したファイル名を持ち上げることができますが、そのままでは、URL を解析するだけでよいと思います。remotefile.info()['Content-Disposition']

を使用することもできますurlparse.urlsplitが、2 番目の例のような URL がある場合は、ファイル名を自分で取得する必要があります。

>>> urlparse.urlsplit('http://example.com/somefile.zip')
('http', 'example.com', '/somefile.zip', '', '')
>>> urlparse.urlsplit('http://example.com/somedir/somefile.zip')
('http', 'example.com', '/somedir/somefile.zip', '', '')

これを行うだけでもよいでしょう：

>>> 'http://example.com/somefile.zip'.split('/')[-1]
'somefile.zip'
>>> 'http://example.com/somedir/somefile.zip'.split('/')[-1]
'somefile.zip'

score 13 · Accepted Answer

http://example.com/somedir/somefile.zip?foo=barのような最後にクエリ変数がないと仮定して、ファイル名自体のみが必要な場合は、これに os.path.basename を使用できます。

[user@host]$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.basename("http://example.com/somefile.zip")
'somefile.zip'
>>> os.path.basename("http://example.com/somedir/somefile.zip")
'somefile.zip'
>>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar")
'somefile.zip?foo=bar'

urlparse の使用について言及した他の投稿者もいますが、これは機能しますが、ファイル名から先頭のディレクトリを削除する必要があります。os.path.basename() を使用する場合は、URL またはファイルパスの最後の部分のみが返されるため、これについて心配する必要はありません。

score 7 · Accepted Answer

http転送に関しては、「ファイル名」はあまり明確に定義された概念ではないと思います。サーバーは「content-disposition」ヘッダーとして提供する場合があります (必須ではありません) remotefile.headers['Content-Disposition']。これが失敗した場合は、おそらく URI を自分で解析する必要があります。

score 6 · Accepted Answer

6

普段やってるこれ見たけど..

filename = url.split("?")[0].split("/")[-1]

于 2015-03-20T18:38:47.497 に答える

score 2 · Accepted Answer

また、最も評価の高い 2 つの回答の両方を組み合わせることもできます。

完全なコードは次のようになります。

>>> remotefile=urllib2.urlopen(url)
>>> try:
>>>   filename=remotefile.info()['Content-Disposition']
>>> except KeyError:
>>>   filename=os.path.basename(urllib2.urlparse.urlsplit(url).path)

score 2 · Accepted Answer

ということurllib2.urlopenですか？モジュールopenfileで呼び出される関数はありません。urllib2

とにかく、urllib2.urlparse関数を使用します。

>>> from urllib2 import urlparse
>>> print urlparse.urlsplit('http://example.com/somefile.zip')
('http', 'example.com', '/somefile.zip', '', '')

出来上がり。

score 1 · Accepted Answer

解析の意味に依存すると思います。URL を解析せずにファイル名を取得する方法はありません。つまり、リモートサーバーはファイル名を提供しません。ただし、自分で多くのことを行う必要はありません。次のurlparseモジュールがあります。

In [9]: urlparse.urlparse('http://example.com/somefile.zip')
Out[9]: ('http', 'example.com', '/somefile.zip', '', '', '')

score 1 · Accepted Answer

私が知っていることではありません。

しかし、次のように簡単に解析できます。

url = 'http://example.com/somefile.zip'
print url.split('/')[-1]

score 0 · Accepted Answer

オペレーティングシステムに依存せず、URL を適切に処理するPurePosixPathを使用することが Pythonic ソリューションです。

>>> from pathlib import PurePosixPath
>>> path = PurePosixPath('http://example.com/somefile.zip')
>>> path.name
'somefile.zip'
>>> path = PurePosixPath('http://example.com/nested/somefile.zip')
>>> path.name
'somefile.zip'

ここにネットワークトラフィックがないことに注意してください (つまり、これらの URL はどこにも行きません)。標準の解析ルールを使用しているだけです。

score 0 · Accepted Answer

import os,urllib2
resp = urllib2.urlopen('http://www.example.com/index.html')
my_url = resp.geturl()

os.path.split(my_url)[1]

# 'index.html'

これはオープンファイルではありませんが、それでも役立つかもしれません:)

python - urllib2 ファイル名

14 に答える 14

Related

Reference