c++ - ランタイム Unicode 文字列を復元する

Question

tcp 経由でエンコードされた Unicode を含むランタイム文字列を受け取るアプリケーションを構築しています。私は次のものを持っていますが、残念ながら、コンパイル時にのみ恩恵を受けることができます: 不完全なユニバーサル文字名 \u コンパイル時に4つの16進数文字が予想されるため

QString restoreUnicode(QString strText)
   {
      QRegExp rx("\\\\u([0-9a-z]){4}");
      return strText.replace(rx, QString::fromUtf8("\u\\1"));
   }

実行時に解決策を探しています。これらの文字列を分割し、「\ u」区切り文字の後の 16 進数を基数 10 に変換して QChar のコンストラクターに渡す操作を予見できますが、探しています私はそのような方法によって発生する時間の複雑さについて非常に懸念しており、専門家ではないため、存在する場合はより良い方法を求めて.

誰でも解決策やヒントを持っていますか。

score 1 · Accepted Answer

#include <assert.h>
#include <iostream>
#include <string>
#include <sstream>
#include <locale>
#include <codecvt>          // C++11
using namespace std;

int main()
{
    char const  data[]  = "\\u7cfb\\u8eca\\u4e21\\uff1a\\u6771\\u5317";

    istringstream   stream( data );

    wstring     ws;
    int         code;
    char        slashCh, uCh;
    while( stream >> slashCh >> uCh >> hex >> code )
    {
        assert( slashCh == '\\' && uCh == 'u' );
        ws += wchar_t( code );
    }

    cout << "Unicode code points:" << endl;
    for( auto it = ws.begin();  it != ws.end();  ++it )
    {
        cout << hex << 0 + *it << endl;
    }
    cout << endl;

    // The following is C++11 specific.
    cout << "UTF-8 encoding:" << endl;
    wstring_convert< codecvt_utf8< wchar_t > >  converter;
    string const bytes = converter.to_bytes( ws );
    for( auto it = bytes.begin();  it != bytes.end();  ++it )
    {
        cout << hex << 0 + (unsigned char)*it << ' ';
    }
    cout << endl;
}

score 1 · Accepted Answer

自分で文字列をデコードする必要があります。Unicode エントリ ( rx.indexIn(strText)) を取得して解析し (int result; std::istringstream iss(s); if (!(iss>>std::hex>>result).fail()) ...元の文字列\\uXXXXを(wchar_t)result.

score 1 · Accepted Answer

閉鎖と将来このスレッドに出くわす人のために、これらの変数のスコープを最適化する前の私の最初の解決策を次に示します。それのファンではありませんが、私が制御できないストリーム内の Unicode および/または ascii の予測不可能な性質 (クライアントのみ) を考えると、それは機能します。等

QString restoreUnicode(QString strText)
{
    QRegExp rxUnicode("\\\\u([0-9a-z]){4}");

    bool bSuccessFlag;
    int iSafetyOffset = 0;
    int iNeedle = strText.indexOf(rxUnicode, iSafetyOffset);

    while (iNeedle != -1)
    {
        QChar cCodePoint(strText.mid(iNeedle + 2, 4).toInt(&bSuccessFlag, 16));

        if ( bSuccessFlag )
            strText = strText.replace(strText.mid(iNeedle, 6), QString(cCodePoint));
        else
            iSafetyOffset = iNeedle + 1; // hop over non code point to avoid lock

        iNeedle = strText.indexOf(rxUnicode, iSafetyOffset);
    }

    return strText;
}

c++ - ランタイム Unicode 文字列を復元する

3 に答える 3

Related

Reference