python - テキストファイルのダウンロードが正しく機能しないのはなぜですか?

Question

Python 3.3.1 を使用しています。download_file()ファイルをダウンロードしてディスクに保存するという関数を作成しました。

#!/usr/bin/python3
# -*- coding: utf8 -*-

import datetime
import os
import urllib.error
import urllib.request


def download_file(*urls, download_location=os.getcwd(), debugging=False):
    """Downloads the files provided as multiple url arguments.

    Provide the url for files to be downloaded as strings. Separate the
    files to be downloaded by a comma.

    The function would download the files and save it in the folder
    provided as keyword-argument for download_location. If
    download_location is not provided, then the file would be saved in
    the current working directory. Folder for download_location would be
    created if it doesn't already exist. Do not worry about trailing
    slash at the end for download_location. The code would take carry of
    it for you.

    If the download encounters an error it would alert about it and
    provide the information about the Error Code and Error Reason (if
    received from the server).

    Normal Usage:
    >>> download_file('http://localhost/index.html',
                      'http://localhost/info.php')
    >>> download_file('http://localhost/index.html',
                      'http://localhost/info.php',
                      download_location='/home/aditya/Download/test')
    >>> download_file('http://localhost/index.html',
                      'http://localhost/info.php',
                      download_location='/home/aditya/Download/test/')

    In Debug Mode, files are not downloaded, neither there is any
    attempt to establish the connection with the server. It just prints
    out the filename and its url that would have been attempted to be
    downloaded in Normal Mode.

    By Default, Debug Mode is inactive. In order to activate it, we
    need to supply a keyword-argument as 'debugging=True', like:
    >>> download_file('http://localhost/index.html',
                      'http://localhost/info.php',
                      debugging=True)
    >>> download_file('http://localhost/index.html',
                      'http://localhost/info.php',
                      download_location='/home/aditya/Download/test',
                      debugging=True)

    """
    # Append a trailing slash at the end of download_location if not
    # already present
    if download_location[-1] != '/':
        download_location = download_location + '/'

    # Create the folder for download_location if not already present
    os.makedirs(download_location, exist_ok=True)

    # Other variables
    time_format = '%Y-%b-%d %H:%M:%S'   # '2000-Jan-01 22:10:00'

    # "Request Headers" information for the file to be downloaded
    accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
    accept_encoding = 'gzip, deflate'
    accept_language = 'en-US,en;q=0.5'
    connection = 'keep-alive'
    user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:20.0) \
                  Gecko/20100101 Firefox/20.0'
    headers = {'Accept': accept,
               'Accept-Encoding': accept_encoding,
               'Accept-Language': accept_language,
               'Connection': connection,
               'User-Agent': user_agent,
               }

    # Loop through all the files to be downloaded
    for url in urls:
        filename = os.path.basename(url)
        if not debugging:
            try:
                request_sent = urllib.request.Request(url, None, headers)
                response_received = urllib.request.urlopen(request_sent)
            except urllib.error.URLError as error_encountered:
                print(datetime.datetime.now().strftime(time_format),
                      ':', filename, '- The file could not be downloaded.')
                if hasattr(error_encountered, 'code'):
                    print(' ' * 22, 'Error Code -', error_encountered.code)
                if hasattr(error_encountered, 'reason'):
                    print(' ' * 22, 'Reason -', error_encountered.reason)
            else:
                read_response = response_received.read()
                output_file = download_location + filename
                with open(output_file, 'wb') as downloaded_file:
                    downloaded_file.write(read_response)
                print(datetime.datetime.now().strftime(time_format),
                      ':', filename, '- Downloaded successfully.')
        else:
            print(datetime.datetime.now().strftime(time_format),
                  ': Debugging :', filename, 'would be downloaded from :\n',
                  ' ' * 21, url)

この機能は、PDF、画像、その他の形式のダウンロードには適していますが、html ファイルなどのテキストドキュメントでは問題が発生します。問題は、最後にこの行で何かをしなければならないと思います:

with open(output_file, 'wb') as downloaded_file:

wtなので、モードでも開いてみました。wモードのみで作業することも試みました。しかし、これでは問題は解決しません。

他の問題はエンコードにあった可能性があるため、2 行目を次のように含めました。

# -*- coding: utf8 -*-

しかし、これはまだ機能しません。何が問題で、テキストファイルとバイナリファイルの両方で機能させるにはどうすればよいですか?

うまくいかない例:

>>>download_file("http://docs.python.org/3/tutorial/index.html")

Geditで開くと、次のように表示されます。

geditで

同様に、Firefox で開いた場合:

ファイアフォックスで

python - テキスト ファイルのダウンロードが正しく機能しないのはなぜですか?

1 に答える 1

Related

Reference

python - テキストファイルのダウンロードが正しく機能しないのはなぜですか?