I have a table containing about 30k records, that I'm attempting to iterate over and process with Django's ORM. Each record stores several binary blobs, which can each be several MB in size, that I need to process and write to a file.
However, I'm having trouble using Django for this because of memory constraints. I have 8GB of memory on my system, but after processing about 5k of records, the Python process is consuming all 8GB and gets killed by the Linux kernel. I've tried various tricks for clearing Django's query cache, like:
- periodically calling
MyModel.objects.update()
- setting
settings.DEBUG=False
- periodically invoking Python's garbage collector via
gc.collect()
However, none of these seem to have any noticeable effect, and my process continues to experience some sort of memory leak until it crashes.
Is there anything else I can do?
Since I only need to process each record one at a time, and I never need to access the same record again in the process, I have no need to save any model instance, or load more than one instance at a time. How do you ensure that only one record is loaded and that Django caches nothing and unallocates all memory immediately after use?