4

I'm working on a video server, and I want to use a database to keep video files. Since I only need to store simple video files with metadata I tried to use MongoDB in Java, via its GridFS mechanism to store the video files and their metadata.

However, there are two major features I need, and that I couldn't manage using MongoDB:

  1. I want to be able to add to a previously saved video, since saving a video might be performed in chunks. I don't want to delete the binary I have so far, just append bytes at the end of an item.
  2. I want to be able to read from a video item while it is being written. "Thread A" will update the video item, adding more and more bytes, while "Thread B" will read from the item, receiving all the bytes written by "Thread A" as soon as they are written/flushed.

I tried writing the straightforward code to do that, but it failed. It seems MongoDB doesn't allow multi-threaded access to the binary (even if one thread is doing all the writing), nor could I find a way to add to a binary file - the Java GridFS API only gives an InputStream from an already existing GridFSDBFile, I cannot get an OutputStream to write to it.

  • Is this possible via MongoDB, and if so how?
  • If not, do you know of any other DB that might allow this (preferably nothing too complex such as a full relational DB)?
  • Would I be better off using MongoDB to keep only the metadata of the video files, and manually handle reading and writing the binary data from the filesystem, so I can implement the above requirements on my own?

Thanks,

Al

4

1 に答える 1

7

I've used mongo gridfs for storing media files for a messaging system we built using Mongo so I can share what we ran into.

So before I get into this for your use case scenario I would recommend not using GridFS and actually using something like Amazon S3 (with excellent rest apis for multipart uploads) and store the metadata in Mongo. This is the approach we settled on in our project after first implementing with GridFS. It's not that GridFS isn't great it's just not that well suited for chunking/appending and rewriting small portions of files. For more info here's a quick rundown on what GridFS is good for and not good for:

http://www.mongodb.org/display/DOCS/When+to+use+GridFS

Now if you are bent on using GridFS you need to understand how the driver and read/write concurrency works.

In mongo (2.2) you have one writer thread per schema/db. So this means when you are writing you are essentially locked from having another thread perform an operation. In real life usage this is super fast because the lock yields when a chunk is written (256k) so your reader thread can get some info back. Please look at this concurrency video/presentation for more details:

http://www.10gen.com/presentations/concurrency-internals-mongodb-2-2

So if you look at my two links essentially we can say quetion 2 is answered. You should also understand a little bit about how Mongo writes large data sets and how page faults provide a way for reader threads to get information.

Now let's tackle your first question. The Mongo driver does not provide a way to append data to GridFS. It is meant to be a fire/forget atomic type operation. However if you understand how the data is stored in chunks and how the checksum is calculated then you can do it manually by using the fs.files and fs.chunks methods as this poster talks about here:

Append data to existing gridfs file

So going through those you can see that it is possible to do what you want but my general recommendation is to use a service (such as Amazon S3) that is designed for this type of interaction instead of trying to do extra work to make Mongo fit your needs. Of course you can go to the filesystem directly as well which would be the poor man's choice but you lose redundancy, sharding, replication etc etc that you get with GridFS or S3.

Hope that helps.

-Prasith

于 2012-10-22T16:00:30.640 に答える