c - c 配列エラー - 要素が初期化されることを期待していますが、初期化されておらず、セッターが機能していないように見えることがあります。

Question

2次元配列を作成して値を入力し、配列を読み取ってまったく異なる値を取得するのに問題があります。奇妙なことに、2 つの配列を 2 つ保持しており、そのうちの 1 つは値を正しく格納しており、もう 1 つはそうではありません。私も要素を上書きしていないと確信しています。私は、C が苦手でない人には明らかな愚かなエラーを犯していると思います。

私はviterbi アルゴリズムを実装していることに注意してください。しかし、私が理解している一般的なアルゴリズムは Python で実装されています。これは、C の配列が私に悲しみを与えているだけです。

私がやっていること：

1) 2 つの配列を malloc します。これらは 2D 配列として使用されますが、連続したメモリブロックを割り当てます。ビタビアルゴのフォワードステップを通過する際に配列のすべてのエントリを入力する必要があるため、配列を明示的に初期化しません。

double *viterbi_table = malloc(sizeof(double) * n_samples * n_states);
int *best_path_table = malloc(sizeof(int) * n_samples * n_states);

2) ビタビアルゴの前方部分については、観測データをウォークスルーし、各状態の最も可能性の高い状態と確率を計算します。

for (t = 1; t < n_samples; t++) // for each piece of observed data
{
    for (i = 0; i < n_states; i++)
    {
        max_state_index = 0;
        max_p = -DBL_MAX;
        // calculate the max state and probability by looping through all the states
        // yada yada...

        // IMPORTANT PART: We set the array values here
        viterbi_table[t * n_samples + i] = max_p;
        best_path_table[t * n_samples + i] = max_state_index;
        printf("\tbest_path_table[%d][%d] or [%d] = %d => %d\n", 
            i, t, t * n_samples + i, best_path_table[t * n_samples + i], max_state_index);
    }

    // IMPORTANT PART: print out rows of best path table to see if what we thought we inserted is in there
    if (debug)
    {
        printf("best_path, [ ", t);
        for (i = 0; i < n_states; i++)
        {
            printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
        }
        printf("]\n");
    }
}

3) コードを実行すると、設定した配列要素が、設定したと思っていたものと一致する代わりに、初期化されていない要素のように見える大きな負または正の数値が得られます。何を与える？それらのブロックに値を割り当てました。問題を示す出力の選択された部分を次に示します。

t=36 => sample=X
    best_path_table[0][36] or [1404] = 0 => 0
    best_path_table[1][36] or [1405] = 0 => 0
    best_path_table[2][36] or [1406] = 0 => 0
    best_path_table[3][36] or [1407] = 0 => 0
    ...
best_path, [ [1404], 1399607453 [1405], -1070347604 [1406], 1399607453 [1407], 0 ... ]

対照的に、以下のものが正しいです。

t=37 => sample=X
    best_path_table[0][37] or [1443] = 3 => 3
    best_path_table[1][37] or [1444] = 3 => 3
    best_path_table[2][37] or [1445] = 3 => 3
    ...
best_path, [ [1443], 3 [1444], 3 [1445], ... ]

12 個未満の観測など、短いデータに対してコードを実行すると、このような問題は発生しません。より長いデータに対して実行すると、ほとんどの最適パステーブルが正しく入力されていません。パターンは次のようになります。

observation#
1) correct
2-3) garbage
4) correct
4-5) garbage
and so on

コード

この要点を参照してください。サードパーティのライブラリには依存しません。

編集：

ビタビテーブルの最初の行は、アルゴリズムの前方部分の前のステップで初期化されます。

for (i = 0; i < n_states; i++)
{
    state_i = states[i];
    sample_t = samples[0];
    viterbi_table[i*n_samples]
        = prior(state_i, 0, true) + emission(sample_t, state_i, true);
}

EDIT2：

以前のバージョンのコードでは、より標準的な 2D 配列の初期化 (非連続ブロック内) と対応する配列アクセスを行っていました。これによりbus error、より大きな入力データに対して一貫して対応できました。これは完全に理にかなっています。

double **viterbi_table = malloc(sizeof * viterbi_table * n_states);
int **best_path_table = malloc(sizeof * best_path_table * n_states);
...
viterbi_table[j][t - 1] = ...

EDIT3、ソリューションに関するコメント:

これはばかげた添え字の間違いであることがわかりました。viterbi および最適パス配列のサイズは n_samples * n_states、つまり 17 * 39 = 663 です。これにより、私の例のように 1404 のインデックスは除外されます。

具体的な問題は、n_states の代わりに n_samples を誤って使用したため、配列のインデックス付けが混乱していたことです。特定の観測インデックス t (30) と特定の状態インデックス i (14) の場合、計算は次のようになります。

// Original, wrong
t * n_samples + i = 30 * 39 + 14 = 1184

// New, correct
t * n_states + i = 30 * 17 + 14 = 524

変数tは、現在のサンプル数を既にエンコードしているため、それに状態数を掛けるだけです。

EDIT4、修正コード: 修正コードはここにあります。また、ユースケースの放出確率と遷移確率も調整しました。

score 1 · Accepted Answer

を使用して、添え字が範囲内にあることを確認するマクロ CHECK_SUBSCRIPT を追加しましたassert。発火しました — 下付き文字が制御不能です。

ソース

コメントは削除されました。

#include <assert.h>
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define CHECK_SUBSCRIPT(x, n_samples)   assert((x) >= 0 && (x) < (n_samples) * n_states)

typedef enum { false, true } bool;
typedef double (*prob_function_def)(char, char, bool);

int n_states = 17;
int n_observations = 17;
char states[17] =
    { 'X', '0', '1', '2', '3', '4', '5', '6', '7',
    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
char observations[17] =
    { 'X', '0', '1', '2', '3', '4', '5', '6', '7',
    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};

void viterbi_log (
                char *samples,
                int n_samples,
                char *best_path,
                prob_function_def prior,
                prob_function_def transition,
                prob_function_def emission,
                bool debug
             )
{
    printf("\nviterbi...\n");

    int i, j, t, max_state_index;
    char state_i, state_j, sample_t;
    double trans_p, max_p;

    double *viterbi_table = malloc(sizeof(double) * n_samples * n_states);
    int *best_path_table = malloc(sizeof(int) * n_samples * n_states);

    for (int n33 = 0; n33 < n_samples * n_states; n33++)
    {
        CHECK_SUBSCRIPT(n33, n_samples);
        viterbi_table[n33] = 3.14159;
        best_path_table[n33] = 314159;
    }

    if (debug) printf("\nInitialization:\n");
    for (i = 0; i < n_states; i++)
    {
        state_i = states[i];
        sample_t = samples[0];
        CHECK_SUBSCRIPT(i*n_samples, n_samples);
        viterbi_table[i*n_samples]
            = prior(state_i, 0, true) + emission(sample_t, state_i, true);
        if (debug)
        {
            printf("\t");
            printf("log(prior[%c]) + log(emission[%c][%c]) = %e\n",
                state_i, sample_t, state_i, viterbi_table[i*n_samples]);
        }
    }

    if (debug) printf("\nForward:\n");
    for (t = 1; t < n_samples; t++)
    {
        sample_t = samples[t];
        if (debug) printf("t=%d => sample=%c\n", t, sample_t);
        for (i = 0; i < n_states; i++)
        {
            state_i = states[i];
            max_state_index = 0;
            max_p = -DBL_MAX;

            for (j = 0; j < n_states; j++)
            {
                state_j = states[j];
                CHECK_SUBSCRIPT(((t-1)*n_samples)+j, n_samples);
                trans_p = viterbi_table[((t - 1) * n_samples) + j]
                    + transition(state_i, state_j, true)
                    + emission(sample_t, state_j, true);
                if (trans_p > max_p)
                {
                    max_state_index = j;
                    max_p = trans_p;
                }
            }

            CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
            viterbi_table[t * n_samples + i] = max_p;
            best_path_table[t * n_samples + i] = max_state_index;
            printf("\tbest_path_table[%d][%d] or [%d] = %d => %d\n",
                    i, t, t * n_samples + i, best_path_table[t * n_samples + i], max_state_index);
        }

        if (debug)
        {
            printf("best_path, [ ");
            for (i = 0; i < n_states; i++)
            {
                CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
                printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
            }
            printf("]\n");
        }
    }

    if (debug)
    {
        printf("\nbest path table:\n");
        for (t = n_samples - 1; t > 0; t--)
        {
            printf("t=%d, [ ", t);
            for (i = 0; i < n_states; i++)
            {
                CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
                printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
            }
            printf("]\n");
        }
    }

    free(viterbi_table);
    free(best_path_table);
}

double prior_prob (char state_i, char state_j, bool log_prob)
{
    if (!log_prob)
        return 0.25;
    else
        return -1.3862943611198906;
}

double transition_prob (char state_i, char state_j, bool log_prob)
{
    if (!log_prob)
    {
        if (state_i == 0 && state_j == 0)
            return 0.9;
        else if (state_i == 0 || state_j == 0)
            return 0.1;
        else if (state_i == state_j)
            return 0.9;
        else
            return 0.1;
    }
    else
    {
        if (state_i == 0 && state_j == 0)
            return -0.10536051565782628;
        else if (state_i == 0 || state_j == 0)
            return -2.3025850929940455;
        else if (state_i == state_j)
            return -0.10536051565782628;
        else
            return -2.3025850929940455;
    }
}

double emission_prob (char observation, char state, bool log_prob)
{
    if (!log_prob)
    {
        if (state == observation)
            return 0.8;
        else
            return 0.2;
    }
    else
    {
        if (state == observation)
            return -0.2231435513142097;
        else
            return -1.6094379124341003;
    }
}

void dtmf_example()
{
    bool debug = true;
    char *samples = "XXXXXXXXXXXXXX6X66XXXXXXXXXXXXXXXXXXXXX";
    int n_samples = strlen(samples);
    char *best_path = malloc(sizeof(int) * n_samples);

    viterbi_log(samples, n_samples, best_path,
        &prior_prob, &transition_prob, &emission_prob, debug);

    printf("\nbest path = { ");
    for (int t = 0; t < n_samples; t++)
        printf("%d ", best_path[t]);
    printf("}\n");
}

int main(void)
{
    dtmf_example();
    return 0;
}

発生したアサーション:

best_path_table[13][16] or [637] = 0 => 0
best_path_table[14][16] or [638] = 0 => 0
best_path_table[15][16] or [639] = 0 => 0
best_path_table[16][16] or [640] = 0 => 0
best_path, [ [624], 0 [625], 0 [626], 0 [627], 0 [628], 0 [629], 0 [630], 0 [631], 7 [632], 0 [633], 0 [634], 0 [635], 0 [636], 0 [637], 0 [638], 0 [639], 0 [640], 0 ]
t=17 => sample=6
Assertion failed: ((t*n_samples+i) >= 0 && (t*n_samples+i) < (n_samples) * n_states), function viterbi_log, file viterbi3.c, line 89.
Abort trap: 6

それが次の CHECK_SUBSCRIPT です。

        CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
        viterbi_table[t * n_samples + i] = max_p;
        best_path_table[t * n_samples + i] = max_state_index;

score 0 · Accepted Answer

tで始まるため1、最初の行は決して初期化されないように見えます。各行は前の行に依存していると推測しているため、ジャンク値が伝播します。1 つの特定のケースが機能したという事実は、純粋に偶然です (初期化されていないデータがたまたま有効でした)。

c - c 配列エラー - 要素が初期化されることを期待していますが、初期化されておらず、セッターが機能していないように見えることがあります。

2 に答える 2

ソース

Related

Reference