c++ - boost::lexical_cast のパフォーマンスが非常に悪い

Question

Windows XP SP3。コア 2 デュオ 2.0 GHz。boost::lexical_cast のパフォーマンスが非常に遅いことがわかりました。コードを高速化する方法を見つけたかった。Visual C++ 2008 で /O2 最適化を使用し、Java 1.6 および Python 2.6.2 と比較すると、次の結果が表示されます。

整数キャスト:

c++: 
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
    s = boost::lexical_cast<string>(i);
}

java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
    s = new Integer(i).toString();
}

python:
for i in xrange(1,10000000):
    s = str(i)

私が見ている時間は

c++: 6700 ミリ秒

Java: 1178 ミリ秒

Python: 6702 ミリ秒

c++ は Python と同じくらい遅く、Java よりも 6 倍遅いです。

ダブルキャスト:

c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
    s = boost::lexical_cast<string>(d);
}

java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
    double d = i*1.0;
    s = new Double(d).toString();
}

python:
for i in xrange(1,10000000):
    d = i*1.0
    s = str(d)

私が見ている時間は

c++: 56129 ミリ秒

Java: 2852 ミリ秒

Python: 30780 ミリ秒

したがって、doubles の場合、c++ は実際には Python の半分の速度であり、Java ソリューションよりも 20 倍遅くなります。boost::lexical_cast のパフォーマンスを改善するためのアイデアはありますか? これは、stringstream の実装が不十分なことが原因でしょうか。それとも、boost ライブラリを使用すると、一般的にパフォーマンスが 10 倍低下すると予想できますか。

score 77 · Accepted Answer

Edit 2012-04-11

rve quite rightly commented about lexical_cast's performance, providing a link:

http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/performance.html

I don't have access right now to boost 1.49, but I do remember making my code faster on an older version. So I guess:

the following answer is still valid (if only for learning purposes)
there was probably an optimization introduced somewhere between the two versions (I'll search that)
which means that boost is still getting better and better

Original answer

Just to add info on Barry's and Motti's excellent answers:

Some background

Please remember Boost is written by the best C++ developers on this planet, and reviewed by the same best developers. If lexical_cast was so wrong, someone would have hacked the library either with criticism or with code.

I guess you missed the point of lexical_cast's real value...

Comparing apples and oranges.

In Java, you are casting an integer into a Java String. You'll note I'm not talking about an array of characters, or a user defined string. You'll note, too, I'm not talking about your user-defined integer. I'm talking about strict Java Integer and strict Java String.

In Python, you are more or less doing the same.

As said by other posts, you are, in essence, using the Java and Python equivalents of sprintf (or the less standard itoa).

In C++, you are using a very powerful cast. Not powerful in the sense of raw speed performance (if you want speed, perhaps sprintf would be better suited), but powerful in the sense of extensibility.

Comparing apples.

If you want to compare a Java Integer.toString method, then you should compare it with either C sprintf or C++ ostream facilities.

The C++ stream solution would be 6 times faster (on my g++) than lexical_cast, and quite less extensible:

inline void toString(const int value, std::string & output)
{
   // The largest 32-bit integer is 4294967295, that is 10 chars
   // On the safe side, add 1 for sign, and 1 for trailing zero
   char buffer[12] ;
   sprintf(buffer, "%i", value) ;
   output = buffer ;
}

The C sprintf solution would be 8 times faster (on my g++) than lexical_cast but a lot less safe:

inline void toString(const int value, char * output)
{
   sprintf(output, "%i", value) ;
}

Both solutions are either as fast or faster than your Java solution (according to your data).

Comparing oranges.

If you want to compare a C++ lexical_cast, then you should compare it with this Java pseudo code:

Source s ;
Target t = Target.fromString(Source(s).toString()) ;

Source and Target being of whatever type you want, including built-in types like boolean or int, which is possible in C++ because of templates.

Extensibility? Is that a dirty word?

No, but it has a well known cost: When written by the same coder, general solutions to specific problems are usually slower than specific solutions written for their specific problems.

In the current case, in a naive viewpoint, lexical_cast will use the stream facilities to convert from a type A into a string stream, and then from this string stream into a type B.

This means that as long as your object can be output into a stream, and input from a stream, you'll be able to use lexical_cast on it, without touching any single line of code.

So, what are the uses of `lexical_cast`?

The main uses of lexical casting are:

Ease of use (hey, a C++ cast that works for everything being a value!)
Combining it with template heavy code, where your types are parametrized, and as such you don't want to deal with specifics, and you don't want to know the types.
Still potentially relatively efficient, if you have basic template knowledge, as I will demonstrate below

The point 2 is very very important here, because it means we have one and only one interface/function to cast a value of a type into an equal or similar value of another type.

This is the real point you missed, and this is the point that costs in performance terms.

But it's so slooooooowwww!

If you want raw speed performance, remember you're dealing with C++, and that you have a lot of facilities to handle conversion efficiently, and still, keep the lexical_cast ease-of-use feature.

It took me some minutes to look at the lexical_cast source, and come with a viable solution. Add to your C++ code the following code:

#ifdef SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT

namespace boost
{
   template<>
   std::string lexical_cast<std::string, int>(const int &arg)
   {
      // The largest 32-bit integer is 4294967295, that is 10 chars
      // On the safe side, add 1 for sign, and 1 for trailing zero
      char buffer[12] ;
      sprintf(buffer, "%i", arg) ;
      return buffer ;
   }
}

#endif

By enabling this specialization of lexical_cast for strings and ints (by defining the macro SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT), my code went 5 time faster on my g++ compiler, which means, according to your data, its performance should be similar to Java's.

And it took me 10 minutes of looking at boost code, and write a remotely efficient and correct 32-bit version. And with some work, it could probably go faster and safer (if we had direct write access to the std::string internal buffer, we could avoid a temporary external buffer, for example).

score 20 · Accepted Answer

You could specialize lexical_cast for int and double types. Use strtod and strtol in your's specializations.

namespace boost {
template<>
inline int lexical_cast(const std::string& arg)
{
    char* stop;
    int res = strtol( arg.c_str(), &stop, 10 );
    if ( *stop != 0 ) throw_exception(bad_lexical_cast(typeid(int), typeid(std::string)));
    return res;
}
template<>
inline std::string lexical_cast(const int& arg)
{
    char buffer[65]; // large enough for arg < 2^200
    ltoa( arg, buffer, 10 );
    return std::string( buffer ); // RVO will take place here
}
}//namespace boost

int main(int argc, char* argv[])
{
    std::string str = "22"; // SOME STRING
    int int_str = boost::lexical_cast<int>( str );
    std::string str2 = boost::lexical_cast<std::string>( str_int );

    return 0;
}

This variant will be faster than using default implementation, because in default implementation there is construction of heavy stream objects. And it is should be little faster than printf, because printf should parse format string.

score 14 · Accepted Answer

lexical_castJava や Python で使用している特定のコードよりも一般的です。多くのシナリオで機能する一般的なアプローチ (レキシカルキャストは、一時的なストリームにストリーミングしてから、一時的なストリームに戻したり戻したりするだけのものです) が、特定のルーチンよりも遅くなることは驚くべきことではありません。

(ちなみに、静的バージョンを使用すると、Java のパフォーマンスが向上する場合がありますInteger.toString(int)[1])。

最後に、文字列の解析と逆解析は、通常、コンパイラを作成していない限り、パフォーマンスにそれほど敏感ではありません。コンパイラを作成している場合lexical_castは、おそらく汎用的すぎて、各桁がスキャンされるたびに整数などが計算されます。

[1] コメント投稿者の「stepancheg」は、静的バージョンの方がパフォーマンスが向上する可能性があるという私のヒントを疑いました。使用したソースは次のとおりです。

public class Test
{
    static int instanceCall(int i)
    {
        String s = new Integer(i).toString();
        return s == null ? 0 : 1;
    }

    static int staticCall(int i)
    {
        String s = Integer.toString(i);
        return s == null ? 0 : 1;
    }

    public static void main(String[] args)
    {
        // count used to avoid dead code elimination
        int count = 0;

        // *** instance

        // Warmup calls
        for (int i = 0; i < 100; ++i)
            count += instanceCall(i);

        long start = System.currentTimeMillis();
        for (int i = 0; i < 10000000; ++i)
            count += instanceCall(i);
        long finish = System.currentTimeMillis();
        System.out.printf("10MM Time taken: %d ms\n", finish - start);


        // *** static

        // Warmup calls
        for (int i = 0; i < 100; ++i)
            count += staticCall(i);

        start = System.currentTimeMillis();
        for (int i = 0; i < 10000000; ++i)
            count += staticCall(i);
        finish = System.currentTimeMillis();
        System.out.printf("10MM Time taken: %d ms\n", finish - start);
        if (count == 42)
            System.out.println("bad result"); // prevent elimination of count
    }
}

JDK 1.6.0-14、サーバー VM を使用するランタイム:

10MM Time taken: 688 ms
10MM Time taken: 547 ms

そしてクライアント VM で:

10MM Time taken: 687 ms
10MM Time taken: 610 ms

理論的には、エスケープ解析によってスタックへの割り当てが許可され、インライン化によってすべてのコード (コピーを含む) がローカルメソッドに導入され、冗長なコピーが排除される可能性がありますが、そのような分析にはかなりの時間がかかり、かなりの結果が得られる可能性があります。ここに見られるようなマイクロベンチマークとは対照的に、実際のコードでは正当化されないコードキャッシュの他のコストがあります。

score 9 · Accepted Answer

What lexical cast is doing in your code can be simplified to this:

string Cast( int i ) {
    ostringstream os;
    os << i;
    return os.str();
}

There is unfortunately a lot going on every time you call Cast():

a string stream is created possibly allocating memory
operator << for integer i is called
the result is stored in the stream, possibly allocating memory
a string copy is taken from the stream
a copy of the string is (possibly) created to be returned.
memory is deallocated

Thn in your own code:

 s = Cast( i );

the assignment involves further allocations and deallocations are performed. You may be able to reduce this slightly by using:

string s = Cast( i );

instead.

However, if performance is really importanrt to you, you should considerv using a different mechanism. You could write your own version of Cast() which (for example) creates a static stringstream. Such a version would not be thread safe, but that might not matter for your specific needs.

To summarise, lexical_cast is a convenient and useful feature, but such convenience comes (as it always must) with trade-offs in other areas.

score 8 · Accepted Answer

Unfortunately I don't have enough rep yet to comment...

lexical_cast is not primarily slow because it's generic (template lookups happen at compile-time, so virtual function calls or other lookups/dereferences aren't necessary). lexical_cast is, in my opinion, slow, because it builds on C++ iostreams, which are primarily intended for streaming operations and not single conversions, and because lexical_cast must check for and convert iostream error signals. Thus:

a stream object has to be created and destroyed
in the string output case above, note that C++ compilers have a hard time avoiding buffer copies (an alternative is to format directly to the output buffer, like sprintf does, though sprintf won't safely handle buffer overruns)
lexical_cast has to check for stringstream errors (ss.fail()) in order to throw exceptions on conversion failures

lexical_cast is nice because (IMO) exceptions allow trapping all errors without extra effort and because it has a uniform prototype. I don't personally see why either of these properties necessitate slow operation (when no conversion errors occur), though I don't know of such C++ functions which are fast (possibly Spirit or boost::xpressive?).

Edit: I just found a message mentioning the use of BOOST_LEXICAL_CAST_ASSUME_C_LOCALE to enable an "itoa" optimisation: http://old.nabble.com/lexical_cast-optimization-td20817583.html. There's also a linked article with a bit more detail.

score 8 · Accepted Answer

lexical_castベンチマークの測定値には微妙な問題がある可能性があるため、ベンチマークが示すように、Java および Python に関連して遅い場合とそうでない場合があります。C++ はこれらの操作を延期しないため、レキシカルキャストまたはそれが使用する iostream メソッドによって行われるワークスペースの割り当て/割り当て解除は、ベンチマークによって測定されます。ただし、Java と Python の場合、関連する割り当て解除は、実際には単に将来のガベージコレクションサイクルまで延期され、ベンチマーク測定では見逃されている可能性があります。(ベンチマークの進行中に偶然に GC サイクルが発生した場合を除きます。その場合、測定が多すぎます)。そのため、Java および Python の実装の詳細を調べずに、最終的に課される可能性がある (または課されない可能性がある) 遅延 GC の負担に起因する「コスト」がどれくらいかを確実に知ることは困難です。

この種の問題は明らかに、他の多くの C++ 対ガベージコレクション言語ベンチマークに当てはまる可能性があります。

score 1 · Accepted Answer

速度が問題である場合、または C++ でそのようなキャストがどれだけ高速になるかに関心がある場合は、それに関する興味深いスレッドがあります。

Boost.Spirit 2.1 (Boost 1.40 でリリース予定) は非常に高速で、同等の C 言語 (strtol()、atoi() など) よりもさらに高速です。