9

I'm implementing a set of common yet not so trivial (or error-prone) data structures for C (here) and just came with an idea that got me thinking.

The question in short is, what is the best way to implement two structures that use similar algorithms but have different interfaces, without having to copy-paste/rewrite the algorithm? By best, I mean most maintainable and debug-able.

I think it is obvious why you wouldn't want to have two copies of the same algorithm.

Motivation

Say you have a structure (call it map) with a set of associated functions (map_*()). Since the map needs to map anything to anything, we would normally implement it taking a void *key and void *data. However, think of a map of int to int. In this case, you would need to store all the keys and data in another array and give their addresses to the map, which is not so convenient.

Now imagine if there was a similar structure (call it mapc, c for "copies") that during initialization takes sizeof(your_key_type) and sizeof(your_data_type) and given void *key and void *data on insert, it would use memcpy to copy the keys and data in the map instead of just keeping the pointers. An example of usage:

int i;
mapc m;
mapc_init(&m, sizeof(int), sizeof(int));
for (i = 0; i < n; ++i)
{
    int j = rand();  /* whatever */
    mapc_insert(&m, &i, &j);
}

which is quite nice, because I don't need to keep another array of is and js.

My ideas

In the example above, map and mapc are very closely related. If you think about it, map and set structures and functions are also very similar. I have thought of the following ways to implement their algorithm only once and use it for all of them. Neither of them however are quite satisfying to me.

  1. Use macros. Write the function code in a header file, leaving the structure dependent stuff as macros. For each structure, define the proper macros and include the file:

    map_generic.h
    
    #define INSERT(x) x##_insert
    
    int INSERT(NAME)(NAME *m, PARAMS)
    {
        // create node
        ASSIGN_KEY_AND_DATA(node)
        // get m->root
        // add to tree starting from root
        // rebalance from node to root
        // etc
    }
    
    map.c
    
    #define NAME map
    #define PARAMS void *key, void *data
    #define ASSIGN_KEY_AND_DATA(node) \
    do {\
        node->key = key;\
        node->data = data;\
    } while (0)
    #include "map_generic.h"
    
    mapc.c
    
    #define NAME mapc
    #define PARAMS void *key, void *data
    #define ASSIGN_KEY_AND_DATA(node) \
    do {\
        memcpy(node->key, key, m->key_size);\
        memcpy(node->data, data, m->data_size);\
    } while (0)
    
    #include "map_generic.h"
    

    This method is not half bad, but it's not so elegant.

  2. Use function pointers. For each part that is dependent on the structure, pass a function pointer.

    map_generic.c
    
    int map_generic_insert(void *m, void *key, void *data,
        void (*assign_key_and_data)(void *, void *, void *, void *),
        void (*get_root)(void *))
    {
        // create node
        assign_key_and_data(m, node, key, data);
        root = get_root(m);
        // add to tree starting from root
        // rebalance from node to root
        // etc
    }
    
    map.c
    
    static void assign_key_and_data(void *m, void *node, void *key, void *data)
    {
        map_node *n = node;
        n->key = key;
        n->data = data;
    }
    
    static map_node *get_root(void *m)
    {
        return ((map *)m)->root;
    }
    
    int map_insert(map *m, void *key, void *data)
    {
        map_generic_insert(m, key, data, assign_key_and_data, get_root);
    }
    
    mapc.c
    
    static void assign_key_and_data(void *m, void *node, void *key, void *data)
    {
        map_node *n = node;
        map_c *mc = m;
        memcpy(n->key, key, mc->key_size);
        memcpy(n->data, data, mc->data_size);
    }
    
    static map_node *get_root(void *m)
    {
        return ((mapc *)m)->root;
    }
    
    int mapc_insert(mapc *m, void *key, void *data)
    {
        map_generic_insert(m, key, data, assign_key_and_data, get_root);
    }
    

    This method requires writing more functions that could have been avoided in the macro method (as you can see, the code here is longer) and doesn't allow optimizers to inline the functions (as they are not visible to map_generic.c file).

So, how would you go about implementing something like this?

Note: I wrote the code in the stack-overflow question form, so excuse me if there are minor errors.

Side question: Anyone has a better idea for a suffix that says "this structure copies the data instead of the pointer"? I use c that says "copies", but there could be a much better word for it in English that I don't know about.


Update:

I have come up with a third solution. In this solution, only one version of the map is written, the one that keeps a copy of data (mapc). This version would use memcpy to copy data. The other map is an interface to this, taking void *key and void *data pointers and sending &key and &data to mapc so that the address they contain would be copied (using memcpy).

This solution has the downside that a normal pointer assignment is done by memcpy, but it completely solves the issue otherwise and is very clean.

Alternatively, one can only implement the map and use an extra vectorc with mapc which first copies the data to vector and then gives the address to a map. This has the side effect that deletion from mapc would either be substantially slower, or leave garbage (or require other structures to reuse the garbage).


Update 2:

I came to the conclusion that careless users might use my library the way they write C++, copy after copy after copy. Therefore, I am abandoning this idea and accepting only pointers.

4

3 に答える 3

3

考えられる両方の解決策を大まかに説明しました。

プリプロセッサマクロはおおよそC++テンプレートに対応し、同じ長所と短所があります。

  • それらは読みにくいです。
  • 複雑なマクロは使いにくいことがよくあります(パラメーターの型安全性などを考慮してください)
  • それらはより多くのコードの単なる「ジェネレーター」であるため、コンパイルされた出力にはまだ多くの重複があります。
  • 反対に、コンパイラーが多くのものを最適化できるようにします。

関数ポインターはおおよそC++ポリモーフィズムに対応しており、IMHOのクリーンで一般的に使いやすいソリューションですが、実行時にいくらかのコストがかかります(タイトなループの場合、追加の関数呼び出しはほとんどコストがかかりません)。

パフォーマンスが本当に重要でない限り、私は一般的に関数呼び出しを好みます。

于 2012-06-14T14:20:43.330 に答える
1

あなたが探しているのはポリモーフィズムです。このタスクには、C ++、C#、またはその他のオブジェクト指向言語が適しています。多くの人がCでポリモーフィックな振る舞いを実装しようとしましたが。

Code Projectには、このテーマに関する優れた記事/チュートリアルがいくつかあります。

http://www.codeproject.com/Articles/10900/Polymorphism-in-C

http://www.codeproject.com/Articles/108830/Inheritance-and-Polymorphism-in-C

于 2012-06-14T14:18:00.907 に答える
1

考慮していない3番目のオプションもあります。一連のテンプレートからコードを生成するための外部スクリプト(別の言語で記述された)を作成できます。これはマクロメソッドに似ていますが、PerlやPythonなどの言語を使用してコードを生成できます。これらの言語はCプリプロセッサよりも強力であるため、マクロを介してテンプレートを実行する際に固有の潜在的な問題のいくつかを回避できます。この方法は、例1のように複雑なマクロを使用したい場合に使用しました。結局、Cプリプロセッサを使用するよりもエラーが発生しにくいことがわかりました。欠点は、ジェネレータスクリプトを記述してからmakefileを更新するまでの間に、最初にセットアップするのが少し難しいことです(ただし、IMOは最終的にはそれだけの価値があります)。

于 2012-06-14T15:21:12.397 に答える