私はPythonにかなり慣れているので、この質問を正しく表現できることを願っています.
全体的な問題には、Python からの C ルーチンの呼び出しが含まれます。関連するSOの質問/回答をいくつかまとめると、かなり近づくことができますが、物事を完全にうまくまとめることができないようです。2 つの側面があります。1 つ目はポインターを使用して C ルーチンを呼び出すこと、2 つ目はコールバック関数の使用です。
背景 Rubner は、C で記述された Earth Movers Distance (EMD) ルーチンを提供しています [ EMD C コードの場所] 彼は、EMD ルーチンを呼び出す 2 つのサンプル C プログラムも提供しています。たとえば、EMD ルーチンを呼び出す example2.c の代替として、Python ルーチンを開発しようとしています。(はい、私は EMD の OpenCV 実装に精通しています。)
便宜上、Python から呼び出したい emd.c コードのヘッダー ファイルを次に示します。
/* DEFINITIONS */
#define MAX_SIG_SIZE 100
#define MAX_ITERATIONS 500
#define INFINITY 1e20
#define EPSILON 1e-6
/*****************************************************************************/
/* feature_t SHOULD BE MODIFIED BY THE USER TO REFLECT THE FEATURE TYPE */
typedef int feature_t;
/* typedef struct { int X,Y,Z; } feature_t;*/
/*typedef struct { int X; } feature_t; */
/*****************************************************************************/
typedef struct
{
int n; /* Number of features in the signature */
feature_t *Features; /* Pointer to the features vector */
float *Weights; /* Pointer to the weights of the features */
} signature_t;
typedef struct
{
int from; /* Feature number in signature 1 */
int to; /* Feature number in signature 2 */
float amount; /* Amount of flow from "from" to "to" */
} flow_t;
float emd(signature_t *Signature1, signature_t *Signature2,
float (*func)(feature_t *, feature_t *),
flow_t *Flow, int *FlowSize);
#endif
最後に、これまでにまとめた python コードを示します。私は構造を正しくセットアップしたと思います (確かではありません)。(これは、Rubner emd.c コードで可能な機能構造の単純化されたバージョンであることに注意してください。最終的にはすべてを機能させたいと考えていますが、今のところ単純なものから始めています。)私が抱えている最初の問題呼び出し元の関数の argtypes のどこかにあります。いくつかのバリエーションを試してみましたが、Web で入手できる例はかなり少なく、壁にぶつかりました。
import ctypes
MAX_FEATURE_SIZE = 30
ARRAYFE = ctypes.c_int*MAX_FEATURE_SIZE
ARRAYWE= ctypes.c_float*MAX_FEATURE_SIZE
ARRAYFL = ctypes.c_float*(2*MAX_FEATURE_SIZE-1)
flowSize = ctypes.c_int
emdlib = ctypes.CDLL('emdlib.dylib')
ctypes.CMPFUNC = ctypes.CFUNCTYPE(ctypes.c_float, ctypes.POINTER(ctypes.c_int), ctypes.POINTER(ctypes.c_int))
def py_dist_func(f1,f2):
print "dist: ", f1, f2
return(abs(f1-f2))
dist = ctypes.CMPFUNC(py_dist_func)
n = ctypes.c_int
flowSize = ctypes.c_int
class flow_t(ctypes.Structure):
_fields_ = [("from", ctypes.c_int),
("to", ctypes.c_int),
("amount", ctypes.c_float)]
class signature_t(ctypes.Structure):
_fields_ = [("N", n),("feature", ARRAYFE),
("weight", ARRAYWE)]
# emdlib.emd.argtypes = [ctypes.POINTER(signature_t), ctypes.POINTER(signature_t), ctypes.POINTER(ctypes.c_float), ctypes.POINTER(flow_t), ctypes.POINTER(ctypes.c_int)]
# emdlib.emd.argtypes = [ctypes.POINTER(signature_t), ctypes.POINTER(signature_t), ctypes.CMPFUNC(py_dist_func), ctypes.POINTER(flow_t), ctypes.POINTER(ctypes.c_int)]
emdlib.emd.argtypes = [ctypes.POINTER(signature_t), ctypes.POINTER(signature_t), ctypes.c_float, ctypes.POINTER(flow_t), ctypes.POINTER(ctypes.c_int)]
# emd.restype = ctypes.c_float
emdlib.emd.restype = flow_t
signature1=signature_t()
signature2=signature_t()
feature1 = ARRAYFE
feature2 = ARRAYFE
weight1 =ARRAYWE
weight2 = ARRAYWE
feature1 = [0,1,2,3]
feature2 = [0,3]
weight1 = [1,1,1,1]
weight2 = [1,1]
#signature1= [4,feature1, weight1]
#signature2 = [2, feature2, weight2]
# sample: arr = (ctypes.c_int * len(pyarr))(*pyarr)
signature1.N = len(feature1)
signature1.feature = (ctypes.c_int * MAX_FEATURE_SIZE)(*feature1)
signature2.feature = (ctypes.c_int * MAX_FEATURE_SIZE)(*feature2)
signature1.weight = (ctypes.c_float * MAX_FEATURE_SIZE)(*weight1)
signature2.weight = (ctypes.c_float * MAX_FEATURE_SIZE)(*weight2)
e = emdlib.emd(ctypes.byref(signature1), ctypes.byref(signature2), dist, ctypes.POINTER(flow_t), flowSize)
print "EMD= ", e
print "flowSize", flowSize
私が間違っていた場所についての提案は大歓迎です。
私が遭遇すると確信している 2 番目の問題は、返されるポインターの argtypes に関するものです。ここでの提案もいただければ幸いです。
前もって感謝します。
-------------- 更新された (動作中の) コード
import ctypes
import math
import itertools
MAX_FEATURE_SIZE = 25
FEATURE_t = ctypes.c_int
FEATURE_ptr = ctypes.POINTER(FEATURE_t)
WEIGHT_t = ctypes.c_float
WEIGHT_ptr = ctypes.POINTER(WEIGHT_t)
COUNT_t = ctypes.c_int
COUNT_ptr = ctypes.POINTER(COUNT_t)
class FLOW_t(ctypes.Structure):
_fields_ = [("frm", ctypes.c_int),
("to", ctypes.c_int),
("amount", ctypes.c_float)]
# Note that ctypes.POINTER is compatible with a ctypes array declared
# as TYPE * array_len. This is equivalent to the way we can say 'char
# *foo = "ABCDEF"' in C.
class SIGNATURE_t(ctypes.Structure):
_fields_ = [("N", COUNT_t ),
("feature", FEATURE_ptr),
("weight", WEIGHT_ptr)]
FLOW_ARRAY_t = FLOW_t * (2*MAX_FEATURE_SIZE - 1)
CMPFUNC_t = ctypes.CFUNCTYPE(ctypes.c_float, FEATURE_ptr, FEATURE_ptr)
SIGNATURE_ptr = ctypes.POINTER(SIGNATURE_t)
FLOW_ptr = ctypes.POINTER(FLOW_t)
# Convenience function - keeps us from having to remember all the types and parameters later on
def make_signature(features, weights):
sig = SIGNATURE_t()
sig.N = len(features)
sig.feature = (len(features) * FEATURE_t)(*features)
sig.weight = (len(weights) * WEIGHT_t)(*weights)
return sig
# We want to pass into C a custom distance function from Python
def py_dist_func(f1,f2):
# print "f1, f2: %d, %d" % ( f1[0], f2[0] )
d= distance(f1[0],f2[0])
return d
# set this up as a holder for distance function between any two n-D points
def distance(p0,p1):
return(math.fabs(p0-p1))
dist_callback = CMPFUNC_t(py_dist_func)
#print "Importing emdlib"
emdlib = ctypes.CDLL('emdlib.dylib')
#print "Setting argtypes"
emdlib.emd.argtypes = [ SIGNATURE_ptr,
SIGNATURE_ptr,
CMPFUNC_t,
FLOW_ptr,
COUNT_ptr ]
#print "Setting restype"
emdlib.emd.restype = ctypes.c_float
feature1 = [0, 1,2,3,4,5,6,7,8]
feature2 = [0, 1,2,3,4,5,6,7,8]
weight1 = [0.275,0.296,0.002,0.131,0.208,0.048,0.058,0.098,0.455]
weight2 = [0.285,0.421,0.028,0.021,0.240,0.166,0.023,0.054,0.469]
#print "Creating signatures"
signature1 = make_signature(feature1, weight1)
signature2 = make_signature(feature2, weight2)
flow_array = FLOW_ARRAY_t()
flow_size = COUNT_t()
#print "Calling EMD"
e = emdlib.emd(ctypes.byref(signature1),
ctypes.byref(signature2),
dist_callback,
flow_array,
ctypes.byref(flow_size))
print "EMD= ", e
print "Number of FlowS", flow_size.value
print "Flow"
print "from to amount"
totalFlow=0.0
for i in range(0,flow_size.value):
# print "Flow from %d to %d amount :%f" %(flow_array[i].frm, flow_array[i].to, flow_array[i].amount)
print " %d %d %f" %(flow_array[i].frm, flow_array[i].to, flow_array[i].amount)
totalFlow=totalFlow+flow_array[i].amount
#
# now adjust EMD to account for different signature masses and make it a metric
alpha=1.0
mass1=sum(weight1)
mass2=sum(weight2)
fList=[feature1,feature2]
max_distance= 0.0
for p0, p1 in list(itertools.product(*fList)):
# print p0,p1, distance(p0,p1), max_distance
max_distance = max(max_distance, distance(p0, p1))
print "\nMax distance= %f" % max_distance
print "Total Source = %f" % mass1
print "Total Demand = %f" % mass2
print "Total Flow= %f\n " % totalFlow
print "Alpha = %f\n" %alpha
# emdHat = e*totalFlow+math.sqrt((mass1-mass2)*(mass1-mass2))*alpha*max_distance
emdHat = e*totalFlow+math.fabs((mass1-mass2))*alpha*max_distance
print "Corrected Earth Movers Distance \n"
print "emdHat = %f\n" % emdHat;