c++ - icu ライブラリを使用した UTF-8 から UCS-2 への変換

Question

私は現在、icu ライブラリを使用して UTF-8 文字列を UCS-2 文字列に変換する際に問題に取り組んでいます。ライブラリでこれを行う方法はいくつかありますが、これまでのところどれも機能していないようですが、このライブラリの人気を考えると、私は何か間違ったことをしていると思います.

まずは共通コードです。すべての場合において、オブジェクトに文字列を作成して渡しますが、変換ステップに到達するまで操作はありません。

現在使用されている utf-8 文字列は単に "ĩ" です。

簡単にするためにuniString、このコードのように使用されている文字列を表します

UErrorCode resultCode = U_ZERO_ERROR;

UConverter* m_pConv = ucnv_open("ISO-8859-1", &resultCode);

// Change the callback to error out instead of the default            
const void* oldContext;
UConverterFromUCallback oldFromAction;
UConverterToUCallback oldToAction;
ucnv_setFromUCallBack(m_pConv, UCNV_FROM_U_CALLBACK_STOP, NULL, &oldFromAction, &oldContext, &resultCode);
ucnv_setToUCallBack(m_pConv, UCNV_TO_U_CALLBACK_STOP, NULL, &oldToAction, &oldContext, &resultCode);

int32_t outputLength = 0;
int bodySize = uniString.length();
int targetSize = bodySize * 4;
char* target = new char[targetSize];                       

printf("Body: %s\n", uniString.c_str());
if (U_SUCCESS(resultCode))
{
    // outputLength = ucnv_convert("ISO-8859-1", "UTF-8", target, targetSize, uniString.c_str(), bodySize, &resultCode);
    outputLength = ucnv_fromAlgorithmic(m_pConv, UCNV_UTF8, target, targetSize, uniString.c_str(),
        uniString.length(), &resultCode);
    ucnv_close(m_pConv);
}
printf("ISO-8859-1 DGF just tried to convert '%s' to '%s' with error '%i' and length '%i'", uniString.c_str(), 
    outputLength ? target : "invalid_char", resultCode, outputLength);

if (resultCode == U_INVALID_CHAR_FOUND || resultCode == U_ILLEGAL_CHAR_FOUND || resultCode == U_TRUNCATED_CHAR_FOUND)
{
    if (resultCode == U_INVALID_CHAR_FOUND)
    {
        printf("Unmapped input character, cannot be converted to Latin1");                    

        m_pConv = ucnv_open("UCS-2", &resultCode);
        if (U_SUCCESS(resultCode))
        {
            // outputLength = ucnv_convert("UCS-2", "UTF-8", target, targetSize, uniString.c_str(), bodySize, &resultCode);
            outputLength = ucnv_fromAlgorithmic(m_pConv, UCNV_UTF8, target, targetSize, uniString.c_str(),
                uniString.length(), &resultCode);
            ucnv_close(m_pConv);
        }

        printf("UCS-2 DGF just tried to convert '%s' to '%s' with error '%i' and length '%i'", uniString.c_str(), 
            outputLength ? target : "invalid_char", resultCode, outputLength);

        if (U_SUCCESS(resultCode))
        {
            pdus = SegmentText(target, pText, SEGMENT_SIZE_UNICODE_MAX, true);
        }
    }
    else
    {
        printf("DecodeText(): Text contents does not appear to be valid UTF-8");
    }
}
else
{
    printf("DecodeText(): Text successfully converted to Latin1");
    std::string newBody(target, outputLength);
    pdus = SegmentText(newBody, pPdu, SEGMENT_SIZE_MAX);
}

問題は、ucnv_fromAlgorithmic関数がU_INVALID_CHAR_FOUNDucs-2 変換でエラーをスローしていることです。これはISO-8859-1試みには意味がありますが、ucs-2 には意味がありません。

もう1つの試みは、ucnv_convertコメントアウトされているのを使用することでした。この関数は変換を試みましたが、失敗しませんでしたISO-8859-1。

問題は、これらの関数を使用した経験があり、何か間違っていることを見た人はいますか、またはこの文字の変換の仮定について何か間違っていることがありますか?

score 0 · Accepted Answer

を呼び出す前ににリセットresultCodeする必要があります。マニュアルからの引用：U_ZERO_ERRORucnv_open

「UErrorCode への参照 (C++) またはポインター (C) を取る ICU 関数は、最初に if(U_FAILURE(errorCode)) { return immediately; } をテストして、そのような関数のチェーンでエラーコードを設定する最初の関数が原因となるようにします。次のものは操作を行わないでください」

c++ - icu ライブラリを使用した UTF-8 から UCS-2 への変換

1 に答える 1

Related

Reference