c++ - C++ 11 std::function は仮想呼び出しより遅いですか?

Question

デコレータパターンを使用して、ユーザーが基本的なビルディングブロックから任意の複雑な関数を作成できるようにするメカニズムを作成しています。これは機能的にはうまく機能しますが、特にネストの深さが大きくなると、多くの仮想呼び出しが必要になるという事実が気に入りません。複雑な関数が頻繁に (>100.000 回) 呼び出される可能性があるため、心配です。

この問題を回避するために、デコレータスキームstd::functionが完成したら ( to_function()SSCCE を参照) に変更しようとしました。すべての内部関数呼び出しは、の構築中に配線されstd::functionます。std::functionこのバージョンでは仮想ルックアップを実行する必要がないため、元のデコレータスキームよりも評価が高速になると考えました。

残念ながら、ベンチマークは私が間違っていることを証明しています。実際、デコレータスキームは、std::function私が構築したものよりも高速です。だから今、私はなぜだろうと思っています。単純な基本関数を 2 つしか使用していないため、テストのセットアップに問題がある可能性があります。つまり、vtable ルックアップがキャッシュされている可能性があります。

私が使用したコードは以下に含まれていますが、残念ながらかなり長いです。

SSCCE

// sscce.cpp
#include <iostream>
#include <vector>
#include <memory>
#include <functional>
#include <random>

/**
 * Base class for Pipeline scheme (implemented via decorators)
 */
class Pipeline {
protected:
    std::unique_ptr<Pipeline> wrappee;
    Pipeline(std::unique_ptr<Pipeline> wrap)
    :wrappee(std::move(wrap)){}
    Pipeline():wrappee(nullptr){}

public:
    typedef std::function<double(double)> FnSig;
    double operator()(double input) const{
        if(wrappee.get()) input=wrappee->operator()(input);
        return process(input);
    }

    virtual double process(double input) const=0;
    virtual ~Pipeline(){}

    // Returns a std::function which contains the entire Pipeline stack.
    virtual FnSig to_function() const=0;
};

/**
 * CRTP for to_function().
 */
template <class Derived>
class Pipeline_CRTP : public Pipeline{
protected:
    Pipeline_CRTP(const Pipeline_CRTP<Derived> &o):Pipeline(o){}
    Pipeline_CRTP(std::unique_ptr<Pipeline> wrappee)
    :Pipeline(std::move(wrappee)){}
    Pipeline_CRTP():Pipeline(){};
public:
    typedef typename Pipeline::FnSig FnSig;

    FnSig to_function() const override{
        if(Pipeline::wrappee.get()!=nullptr){

            FnSig wrapfun = Pipeline::wrappee->to_function();
            FnSig processfun = std::bind(&Derived::process,
                static_cast<const Derived*>(this),
                std::placeholders::_1);
            FnSig fun = [=](double input){
                return processfun(wrapfun(input));
            };
            return std::move(fun);

        }else{

            FnSig processfun = std::bind(&Derived::process,
                static_cast<const Derived*>(this),
                std::placeholders::_1);
            FnSig fun = [=](double input){
                return processfun(input);
            };
            return std::move(fun);
        }

    }

    virtual ~Pipeline_CRTP(){}
};

/**
 * First concrete derived class: simple scaling.
 */
class Scale: public Pipeline_CRTP<Scale>{
private:
    double scale_;
public:
    Scale(std::unique_ptr<Pipeline> wrap, double scale) // todo move
:Pipeline_CRTP<Scale>(std::move(wrap)),scale_(scale){}
    Scale(double scale):Pipeline_CRTP<Scale>(),scale_(scale){}

    double process(double input) const override{
        return input*scale_;
    }
};

/**
 * Second concrete derived class: offset.
 */
class Offset: public Pipeline_CRTP<Offset>{
private:
    double offset_;
public:
    Offset(std::unique_ptr<Pipeline> wrap, double offset) // todo move
:Pipeline_CRTP<Offset>(std::move(wrap)),offset_(offset){}
    Offset(double offset):Pipeline_CRTP<Offset>(),offset_(offset){}

    double process(double input) const override{
        return input+offset_;
    }
};

int main(){

    // used to make a random function / arguments
    // to prevent gcc from being overly clever
    std::default_random_engine generator;
    auto randint = std::bind(std::uniform_int_distribution<int>(0,1),std::ref(generator));
    auto randdouble = std::bind(std::normal_distribution<double>(0.0,1.0),std::ref(generator));

    // make a complex Pipeline
    std::unique_ptr<Pipeline> pipe(new Scale(randdouble()));
    for(unsigned i=0;i<100;++i){
        if(randint()) pipe=std::move(std::unique_ptr<Pipeline>(new Scale(std::move(pipe),randdouble())));
        else pipe=std::move(std::unique_ptr<Pipeline>(new Offset(std::move(pipe),randdouble())));
    }

    // make a std::function from pipe
    Pipeline::FnSig fun(pipe->to_function());   

    double bla=0.0;
    for(unsigned i=0; i<100000; ++i){
#ifdef USE_FUNCTION
        // takes 110 ms on average
        bla+=fun(bla);
#else
        // takes 60 ms on average
        bla+=pipe->operator()(bla);
#endif
    }   
    std::cout << bla << std::endl;
}

基準

使用pipe:

g++ -std=gnu++11 sscce.cpp -march=native -O3
sudo nice -3 /usr/bin/time ./a.out
-> 60 ms

使用fun:

g++ -DUSE_FUNCTION -std=gnu++11 sscce.cpp -march=native -O3
sudo nice -3 /usr/bin/time ./a.out
-> 110 ms

score 25 · Accepted Answer

std::functions を呼び出す sを呼び出す s バインディングラムダがあり、その s を呼び出すlamdbastd::functionをバインドstd::functionします ...

あなたのを見てくださいto_function。2 つのを呼び出すラムダを作成し、std::functionそのラムダを別のにバインドして返しますstd::function。コンパイラは、これらのいずれも静的に解決しません。

したがって、最終的には、仮想関数ソリューションと同じ数の間接呼び出しで終了します。これは、バインドを取り除きprocessfun、ラムダで直接呼び出す場合です。それ以外の場合は、2 倍になります。

高速化が必要な場合は、静的に解決できる方法でパイプライン全体を作成する必要があります。つまり、最終的に型を単一のstd::function.

score 9 · Accepted Answer

std::function遅いことで有名です。型の消去とその結果の割り当てがこれに関与します。またgcc、呼び出しはインライン化/最適化が不十分です。このため、人々がこの問題を解決しようとする C++ の「デリゲート」が多数存在します。1 つを Code Review に移植しました。

https://codereview.stackexchange.com/questions/14730/impossively-fast-delegate-in-c11

しかし、Google で他にもたくさん見つけたり、自分で書いたりすることができます。

編集：

最近では、ここで高速デリゲートを探してください。

score 6 · Accepted Answer

std::function の libstdc++ 実装は、おおよそ次のように機能します。

template<typename Signature>
struct Function
{
    Ptr functor;
    Ptr functor_manager;

    template<class Functor>
    Function(const Functor& f)
    {
        functor_manager = &FunctorManager<Functor>::manage;
        functor = new Functor(f);
    }

    Function(const Function& that)
    {
        functor = functor_manager(CLONE, that->functor);
    }

    R operator()(args) // Signature
    {
        return functor_manager(INVOKE, functor, args);
    }

    ~Function()
    {
        functor_manager(DESTROY, functor);
    }
}

template<class Functor>
struct FunctorManager
{
     static manage(int operation, Functor& f)
     {
         switch (operation)
         {
         case CLONE: call Functor copy constructor;
         case INVOKE: call Functor::operator();
         case DESTROY: call Functor destructor;
         }
     }
}

したがってstd::function、Functor オブジェクトの正確な型はわかりませんが、型を認識しているテンプレートインスタンスの静的関数である functor_manager 関数ポインターを介して重要な操作をディスパッチしますFunctor。

各std::functionインスタンスは、自身が所有するファンクターオブジェクトのコピーをヒープに割り当てます (関数ポインターなどのポインターよりも大きくない場合を除きます。この場合、ポインターをサブオブジェクトとして保持するだけです)。

重要な点はstd::function、基になるファンクターオブジェクトに高価なコピーコンストラクターがある場合、および/または多くのスペースを必要とする場合 (たとえば、バインドされたパラメーターを保持するため) のコピーは高価であるということです。

c++ - C++ 11 std::function は仮想呼び出しより遅いですか?

SSCCE

基準

4 に答える 4

Related

Reference