1

I'm using the following script to grab all the files in a directory, then filtering them based on their modified date.

dir = '/tmp/whatever'
dir_files = os.listdir(dir)
dir_files.sort(key=lambda x: os.stat(os.path.join(dir, x)).st_mtime)
files = []
for f in dir_files:
    t = os.path.getmtime(dir + '/' + f)
    c = os.path.getctime(dir + '/' + f)
    mod_time = datetime.datetime.fromtimestamp(t)
    created_time = datetime.datetime.fromtimestamp(c)
    if mod_time >= form.cleaned_data['start'].replace(tzinfo=None) and mod_time <= form.cleaned_data['end'].replace(tzinfo=None):
         files.append(f)
return by_hour

I'm need to go one step further and group the files by the hour in which they where modified. Does anyone know how to do this off the top of their head?

UPDATE: I'd like to have them in a dictionary ({date,hour,files})

UPDATED: Thanks for all your replies!. I tried using the response from david, but when I output the result it looks like below (ie. it's breaking up the filename):

defaultdict(<type 'list'>, {datetime.datetime(2013, 1, 9, 15, 0): ['2', '8', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '1', '8', '4', '3', '.', 'a', 'v', 'i', '2', '9', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '2', '0', '2', '4', '.', 'a', 'v', 'i', '3', '0', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '3', '8', '5', '9', '.', 'a', 'v', 'i', '3', '1', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '4', '1', '2', '4', '.', 'a', 'v', 'i', '3', '2', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '5', '3', '1', '0', '.', 'a', 'v', 'i', '3', '3', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '5', '5', '5', '8', '.', 'a', 'v', 'i'], datetime.datetime(2013, 1, 9, 19, 0): ['6', '1', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '0', '1', '1', '8', '.', 'a', 'v', 'i', '6', '2', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '0', '6', '3', '1', '.', 'a', 'v', 'i', '6', '3', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '1', '4', '1', '5', '.', 'a', 'v', 'i', '6', '4', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '2', '2', '3', '3', '.', 'a', 'v', 'i']})

I was hoping to get it to store the complete file names. Also how would I loop over it and grab the files in each hour and the hour they belong to?

I managed to sort the above out by just changing it to append. However it's not sorted from the oldest hour to the most recent.

Many thanks, Ben

4

4 に答える 4

4

datetimeオブジェクトをラインで最も近い時間に丸くすることができます。

mod_hour = datetime.datetime(*mod_time.timetuple()[:4])

(これはmod_time.timetuple()[:4]のようなタプルを返すためです(2013, 1, 8, 21)。したがって、 a を使用してcollections.defaultdictリストの辞書を保持します。

import collections

by_hour = collections.defaultdict(list)
for f in dir_files:
    t = os.path.getmtime(dir + '/' + f)
    mod_time = datetime.datetime.fromtimestamp(t)
    mod_hour = datetime.datetime(*mod_time.timetuple()[:4])
    # for example, (2013, 1, 8, 21)
    by_hour[mod_hour].append(f)
于 2013-01-09T02:40:22.347 に答える
0
import os, datetime, operator
dir = "Your_dir_path"
by_hour =sorted([(f,datetime.datetime.fromtimestamp(os.path.getmtime(os.path.join(dir , f)))) for f in os.listdir(dir)],key=operator.itemgetter(1), reverse=True)

上記のコードは、年->月->日->時間->分->秒の形式に基づいてソートします。

于 2013-01-09T04:24:15.117 に答える
0

David の優れた回答に基づいて、 itertools.groupby を使用して作業を少し簡素化できます。

import os, itertools, datetime

dir = '/tmp/whatever'
mtime = lambda f : datetime.datetime.fromtimestamp(os.path.getmtime(dir + '/' + f))
mtime_hour = lambda f: datetime.datetime(*mtime(f).timetuple()[:4])
dir_files = sorted(os.listdir(dir), key=mtime)
dir_files = filter(lambda f: datetime.datetime(2012,1,2,4) <   mtime(f) < datetime.datetime(2012,12,1,4), dir_files) 
by_hour = dict((k,list(v)) for k,v in itertools.groupby(dir_files, key=mtime_hour)) #python 2.6
#by_hour = {k:list(v) for k,v in itertools.groupby(dir_files, key=mtime_hour)} #python 2.7
于 2013-01-09T04:30:01.703 に答える
0

エントリを遅延して作成し、UTC タイムゾーンを使用し、変更時刻を 1 回だけ読み取ります。

#!/usr/bin/env python
import os
from collections import defaultdict
from datetime import datetime

HOUR = 3600 # seconds in an hour
dirpath = "/path/to/dir"
start, end = datetime(...), datetime(...)

# get full paths for all entries in dirpath
entries = (os.path.join(dirpath, name) for name in os.listdir(dirpath))
# add modification time truncated to hour
def date_and_hour(path):
    return datetime.utcfromtimestamp(os.path.getmtime(path) // HOUR * HOUR)
entries = ((date_and_hour(path), path) for path in entries)
# filter by date range: [start, end)
entries = ((mtime, path) for mtime, path in entries if start <= mtime < end)
# group by hour
result = defaultdict(list)
for dt, path in entries:
    result[dt].append(path)

from pprint import pprint
pprint(dict(result))
于 2013-01-10T07:34:32.660 に答える