c++ - 最も頻繁に発生するラベルに従って、いくつかの要素にすばやくラベルを付ける

Question

コンポーネントのベクトルがあるとします（各コンポーネントはフロートのベクトルです）

vector<vector<float> > components

そして、データのベクトルがあります（各データは、コンポーネントと同じサイズの浮動小数点のベクトルです）

vector< vector<float> > data

およびこのデータに関連付けられたラベル

vector<string> labels

(ここで意味するlabel[i]のはのラベルですdata[i])。2 つのベクトル間の距離を返す距離関数もあります。

float distance(vector<float> v1, vector<float> v2);

このコンポーネントに関連付けられたデータの中で最も発生するラベルに従って、各コンポーネントにラベルを付けたいと思います。つまり、次のようなものです。

for each data d from data
{
   let c the nearest component from d according to distance.
   associate the label of d to c.
}

for each component c
{
   definitely give to c the label that occur the most among labels associated to it
   // example if labels {l1,l2,l1,l2,l1,l1,l1,l8,l1} were associated to c, then its label should be l1
}

<component,label>返される最終結果は、次のように記述されたラベル付きコンポーネント ( のペア) のベクトルです。

vector< pair< vector<float>, string > > labeledComponents.

C++ でそれを行う簡単で迅速な方法は何ですか?

score 1 · Accepted Answer

このタスクは複雑であるため、C++ では非常に単純で迅速な方法はありませんが、私が得たものは次のとおりです。

typedef vector<float> componenttype;
typedef vector<float> datatype;
typedef map<string, int> possiblenames;
typedef vector<pair<componenttype, string>> resulttype;

float vecdistance(datatype v1, componenttype v2) {return 1.0;}

resulttype user995434(vector<datatype> data, vector<string> labels, vector<componenttype> components) {
    map<componenttype, possiblenames> maybenames;
    resulttype resultnames;

    //for each data d from data
    for(auto d=data.begin(); d!=data.end(); ++d) {
       //let c the nearest component from d according to distance.
       auto closest=components.begin();
       float closedistance = FLT_MAX;
       for(auto it=components.begin(); it!=components.end(); ++it) {
           float dist = vecdistance(*d, *it);
           if (dist < closedistance) {
               closedistance = dist; 
               closest = it;
           }
        }
        //associate the label of d to c.
        int offset = std::distance(data.begin(), d);
        maybenames[*closest][labels[offset]]++;
    }
    //for each component c
    for(auto c=components.begin(); c!=components.end(); ++c) {
        //let mostname be the name with the most matches.
        auto posnames = maybenames[*c];
        posnames[""]=0; //guarantee each component has _something_
        auto mostname = posnames.begin();
        for(auto it=posnames.begin(); it!=posnames.end(); ++it) {
            if (it->second > mostname->second)
                mostname = it;
        }
        //associate mostname with c
        resultnames.push_back(make_pair(*c, mostname->first));
    }
    return resultnames;
}

コンパイルと実行の証明ここ、ただし、その正確性はまったく検証していません。

何らかの方法でソートされているデータや、ショートカットとして使用できるその他のものについて言及したことがないため、このアルゴリズムはどの言語でも「迅速」ではないことに注意してください。

score 0 · Accepted Answer

データ要素に従って C のラベルを見つけるためのアルゴリズムはほとんどあります。の各要素を反復処理してcomponents、距離が現在の最小値より小さいかどうかを確認する必要があります。そうであれば、現在のコンポーネントを記憶し、最小距離を現在の距離に設定します。

注: 複数のコンポーネントが同じ最小距離にある可能性があることを忘れないでください。そのため、1 つだけでなく、最小距離コンポーネントのコレクションを保持してください。

コンポーネントから (ラベルから発生番号へのマップ) のマップ ( を参照std::map<>) を作成し、このマッピングから各コンポーネントの最大の発生番号を選択できます。

c++ - 最も頻繁に発生するラベルに従って、いくつかの要素にすばやくラベルを付ける

2 に答える 2

Related

Reference