python - PythonのURLからGET変数を削除します

Question

私はこのURLを持っています：

http://www.exmaple.com/boo/a.php?a=jsd

そして私が出力したいのはこのようなものです：

http://www.exmaple.com/boo/

私が持っているなら賢明なように

http://www.exmaple.com/abc.html

そのはず

http://www.exmaple.com/

と

http://www.exmaple.com/

戻る必要があります

http://www.exmaple.com/

変更なし

これは私が試したことです

re.sub(r'\?[\S]+','',"http://www.exmaple.com/boo/a.php?a=jsd")

しかし、それは戻ります

http://www.exmaple.com/boo/a.php

正しい出力を得るために何ができるか、または誰かがこれを行うためのより良いアイデアを持っていますか？

score 5 · Accepted Answer

このように stdliburlparseモジュールを使用してください。通常、どうしても必要な場合を除き、正規表現は使用しないようにしています。

from urlparse import urlparse, urlunparse
>>> parsed = urlparse("http://www.exmaple.com/boo/a.php?a=jsd")
>>> scheme, netloc, path, params, query, fragment = parsed
>>> urlunparse((scheme,netloc,path.split('/')[1],'','',''))
'http://www.exmaple.com/boo'

score 1 · Accepted Answer

私はそのようなことをします：

>>> import re
>>> url = "http://www.exmaple.com/boo/a.php?a=jsd"
>>> url[:url.rfind("/")+1]
'http://www.exmaple.com/boo/'

最後の「/」の後にあるものをすべて削除します。ただし、すべての特殊なケースをカバーしているかどうかはわかりません...

編集:新しいソリューションを使用してurlparse、私の単純なrfind:

import re, urlparse
def url_cutter(url):
    up = urlparse.urlparse(url)
    url2 = up[0]+"://"+up[1]+up[2]
    if url.rfind("/")>6:
            url2 = url2[:url2.rfind("/")+1]
    return url2

それで：

In [36]: url_cutter("http://www.exmaple.com/boo/a.php?a=jsd")
Out[36]: 'http://www.exmaple.com/boo/'

In [37]: url_cutter("http://www.exmaple.com/boo/a.php?a=jsd#dvt_on")
Out[37]: 'http://www.exmaple.com/boo/'

In [38]: url_cutter("http://www.exmaple.com")
Out[38]: 'http://www.exmaple.com'

score 0 · Accepted Answer

それを行うためのより最適化された方法があるかもしれませんが、これを使用すると、あいまいなインポートやサードパーティのパッケージは必要ありません。

url = "http://www.google.com/abc/abc.html?q=test"
cleaned_url = url[:url.rindex("?")]
cleaned_url = cleaned_url.split("/")
cleaned_url = [item for item in cleaned_url if ".html" not in item]
cleaned_url = "/".join(cleaned_url)

python - PythonのURLからGET変数を削除します

3 に答える 3

Related