python - Python と Scrapy で複数のファイルに名前を付ける

Question

Scrapy を使用して Web からファイルをスクレイピングした後、ファイルをディレクトリに保存しようとしています。ファイルから日付を抽出し、それをファイル名として使用しています。しかし、私が直面している問題は、いくつかのファイルが同じ日付を持っていることです。つまり、「2009 年 6 月 2 日」という名前のファイルが 2 つあります。したがって、私が探しているのは、同じ名前のファイルが既に存在するかどうかをどうにかしてチェックすることです。存在する場合は、「June 2, 2009.1」などの名前を付けます。

私が使用しているコードは次のとおりです。

def parse_item(self, response):
    self.log('Hi, this is an item page! %s' % response.url) 

    response = response.replace(body=response.body.replace('<br />', '\n'))

    hxs = HtmlXPathSelector(response)

    date = hxs.select("//div[@id='content']").extract()[0]
    dateStrip = re.search(r"([A-Z]*|[A-z][a-z]+)\s\d*\d,\s[0-9]+", date) 
    newDate = dateStrip.group()


    content = hxs.select("//div[@id='content']") 
    content = content.select('string()').extract()[0]

    filename = ("/path/to/a/folder/ %s.txt") % (newDate) 


    with codecs.open(filename, 'w', encoding='utf-8') as output:
        output.write(content)

score 1 · Accepted Answer

os.listdir を使用して、既存のファイルのリストを取得し、競合を引き起こさないファイル名を割り当てることができます。

import os
def get_file_store_name(path, fname):
    count = 0
    for f in os.listdir(path):
        if fname in f:
            count += 1
    return os.path.join(path, fname+str(count))

# This is example to use 
print get_file_store_name(".", "README")+".txt"

score 0 · Accepted Answer

もう1つの答えは、PythonのOSツールをチェックインすることで正しい方向を示しましたが、私が見つけた方法はおそらくもっと簡単だと思います。ここを参照Pythonを使用してファイルが存在するかどうかを確認するにはどうすればよいですか？多くのための。

以下は私が思いついたコードです：

    existence = os.path.isfile(filename)

    if existence == False:
        with codecs.open(filename, 'w', encoding='utf-8') as output:
            output.write(content)
    else:
        newFilename = ("/path/.../.../- " + '%s' ".1.txt") % (newDate)
        with codecs.open(newFilename, 'w', encoding='utf-8') as output:
            output.write(content)

追加するために編集：

私はこの解決策があまり好きではなかったので、他の答えの解決策の方がおそらく良いと思いましたが、うまくいきませんでした。私が自分のソリューションについて気に入らなかった主な部分は、同じ名前の2つのファイルでしか機能しないことでした。3つまたは4つのファイルが同じ名前の場合、最初の問題が発生します。以下は私が思いついたものです：

filename = ("/Users/path/" + " " + "title " + '%s' + " " + "-1.txt") % (date) 
filename = str(filename)

    while True:
        os.path.isfile(filename)
        newName = filename.replace(".txt", "", filename)
        newName = str.split(newName)
        newName[-1] = str(int(newName[-1]) + 1)
        filename = " ".join(newName) + ".txt"
        if os.path.isfile(filename) == False:
            with codecs.open(filename, 'w', encoding='utf-8') as output:
                output.write(texts)
            break

それはおそらく最もエレガントではなく、一種のハック的なアプローチかもしれませんが、これまでのところうまくいき、私の問題に対処したようです。

score 0 · Accepted Answer

C ライブラリにファイルが存在するかどうかを確認する通常の方法は、という関数を使用することstat()です。Python は、この関数の薄いラッパーをの形式で提供しますos.stat()。それを使うことをお勧めします。

http://docs.python.org/library/stat.html

def file_exists(fname):
    try:
        stat_info = os.stat(fname)
        if os.S_ISREG(stat_info): # true for regular file
            return True
    except Exception:
        pass
    return False

score 0 · Accepted Answer

もう1つの解決策は、次のようなファイルに名前を付けるために、日付に時間を追加できることです

from datetime import datetime

filename = ("/path/to/a/folder/ %s_%s.txt") % (newDate,datetime.now().strftime("%H%M%S"))

python - Python と Scrapy で複数のファイルに名前を付ける

4 に答える 4

Related

Reference