9

I want to create a 2D Binary (Bit) Array in Python in a space and time efficient way as my 2D bitarray would be around 1 Million(rows) * 50000(columns of 0's or 1's) and also I would be performing bitwise operations over these huge elements. My array would look something like:

0 1 0 1
1 1 1 0
1 0 0 0 
...

In C++ most efficient way (space) for me would be to create a kind of array of integers where each element represents 32 bits and then I can use the shift operators coupled with bit-wise operators to carry operations.

Now I know that there is a bitarray module in python. But I am unable to create a 2D structure using list of bit-arrays. How can I do this?

Another way I know in C++ would be to create a map something like map<id, vector<int> > where then I can manipulate the vector as I have mentioned above. Should I use the dictionary equivalent in python?

Even if you suggest me some way out to use bit array for this task it will be great If I can get to know whether I can have multiple threads operate on a splice of bitarray so that I can make it multithreaded. Thanks for the help!!

EDIT:

I can even go on creating my own data structure for this if the need be. However just wanted to check before reinventing the wheel.

4

3 に答える 3

5

私のコメントによると、あなたはセットを使うことができるかもしれません

0 1 0 1
1 1 1 0
1 0 0 0 

として表すことができます

set([(1,0), (3,0), (0,1), (1,1), (2, 1), (0,2)])

また

{(1,0), (3,0), (0,1), (1,1), (2, 1), (0,2)}

ANDは、2セットの共通部分に相当します
。ORは、2セットの和集合です。

于 2012-05-01T10:48:09.773 に答える
4

How about the following:

In [11]: from bitarray import bitarray

In [12]: arr = [bitarray(50) for i in xrange(10)]

This creates a 10x50 bit array, which you can access as follows:

In [15]: arr[0][1] = True

In [16]: arr[0][1]
Out[16]: True

Bear in mind that a 1Mx50K array would require about 6GB of memory (and a 64-bit build of Python on a 64-bit OS).

whether I can have multiple threads operate on a splice of bitarray so that I can make it multithreaded

That shouldn't be a problem, with the usual caveats. Bear in mind that due to the GIL you are unlikely to achieve performance improvements through multithreading.

于 2012-05-01T10:36:40.233 に答える
3

Can you use numpy?

>>> import numpy
>>> A = numpy.zeros((50000, 1000000), dtype=bool)

EDIT: Doesn't seem to be the most space efficient. Uses 50GB (1-byte per bool). Does anyone know if numpy has a way to use packed bools?

于 2012-05-01T10:38:03.507 に答える