c++ - ASCII から Unicode char コード (FreeType2) への変換

Question

プロジェクトの 1 つで FreeType2 を使用しています。文字をレンダリングするには、Unicode の 2 バイト文字コードを提供する必要があります。ただし、プログラムが読み取る文字コードは ASCII の 1 バイト形式です。char コードが 128 未満 (文字コードは同じ) の場合は問題ありませんが、それ以外の 128 は一致しません。例えば：

ASCII の 'a' は 0x61、Unicode の 'a' は 0x0061 - これで問題
ありません

私はそこで WinAPI 関数を使用しようとしていましたが、何か間違ったことをしているに違いありません。サンプルは次のとおりです。

unsigned char szTest1[] = "ąółź"; //ASCII format
wchar_t* wszTest2;
int size = MultiByteToWideChar(CP_UTF8, 0, (char*)szTest1, 4, NULL, 0);
printf("size = %d\n", size);
wszTest2 = new wchar_t[size];
MultiByteToWideChar(CP_UTF8, 0, (char*)szTest1, 4, wszTest2, size);
printf("HEX: %x\n", wszTest2[0]);
delete[] wszTest2;

最後に NULL がない、新しいワイド文字列が作成されることを期待しています。ただし、サイズ変数は常に 0 です。何が間違っているのか分かりますか? それとも、問題を解決する簡単な方法がありますか?

score 6 · Accepted Answer

CodePageパラメータ toMultiByteToWideCharが間違っています。Utf-8 は ASCII と同じではありません。現在のシステムコードページを示すものを使用する必要がCP_ACPあります (これは ASCII と同じではありません - Unicode、UTF、ASCII、ANSI 形式の違いを参照してください) 。

テスト文字列が有効な Utf-8 文字列ではないため、サイズがゼロである可能性が最も高いです。

ほとんどすべての Win32 関数では、関数が詳細なエラーコードの取得に失敗した後に GetLastError() を呼び出すことができるため、それを呼び出すと詳細も得られます。

score 6 · Accepted Answer

「純粋な」ASCII 文字セットは、0 ～ 127 (7 ビット) の範囲に制限されています。最上位ビットが設定された 8 ビット文字 (つまり、128 ～ 255 の範囲の文字) は一意に定義されていません。その定義はコードページによって異なります。したがって、文字ą( LATIN SMALL LETTER A WITH OGONEK0xB9 ) は、特定のコードページの値で表されます。これはWindows-1250である必要があります。他のコードページでは、値0xB9は別の文字に関連付けられます (たとえば、Windows 1252 コードページで0xB9は、は文字¹、つまり上付き数字 1 に関連付けられます)。

Windows Win32 API を使用して特定のコードページから Unicode UTF-16 に文字を変換するにMultiByteToWideCharは、正しいコードページを指定してを使用できます (これは質問のコードに書かれているとおりではありません 。実際、 Unicode UTF-8 を識別します) 。 . (ANSI Central European; Central European (Windows)) を適切なコードページ識別子として指定してみてください。CP_UTF8CP_UTF81250

コード内でATLにアクセスできる場合は、のような便利なATL 文字列変換ヘルパークラスを使用できます。例えば：CA2WMultiByteToWideChar(

#include <atlconv.h> // ATL String Conversion Helpers
// 'test' is a Unicode UTF-16 string.
// Conversion is done from code-page 1250
// (ANSI Central European; Central European (Windows))
CA2W test("ąółź", 1250);

testこれで、Unicode API で文字列を使用できるようになります。

ATL にアクセスできない場合、またはC++ STL ベースのソリューションが必要な場合は、次のようなコードを検討することをお勧めします。

///////////////////////////////////////////////////////////////////////////////
//
// Modern STL-based C++ wrapper to Win32's MultiByteToWideChar() C API.
//
// (based on http://code.msdn.microsoft.com/windowsdesktop/C-UTF-8-Conversion-Helpers-22c0a664)
//
///////////////////////////////////////////////////////////////////////////////

#include <exception>    // for std::exception
#include <iostream>     // for std::cout
#include <ostream>      // for std::endl
#include <stdexcept>    // for std::runtime_error
#include <string>       // for std::string and std::wstring
#include <Windows.h>    // Win32 Platform SDK

//-----------------------------------------------------------------------------
// Define an exception class for string conversion error.
//-----------------------------------------------------------------------------
class StringConversionException 
    : public std::runtime_error
{
public:
    // Creates exception with error message and error code.
    StringConversionException(const char* message, DWORD error)
        : std::runtime_error(message)
        , m_error(error)
    {}

    // Creates exception with error message and error code.
    StringConversionException(const std::string& message, DWORD error)
        : std::runtime_error(message)
        , m_error(error)
    {}

    // Windows error code.
    DWORD Error() const
    {
        return m_error;
    }

private:
    DWORD m_error;
};

//-----------------------------------------------------------------------------
// Converts an ANSI/MBCS string to Unicode UTF-16.
// Wraps MultiByteToWideChar() using modern C++ and STL.
// Throws a StringConversionException on error.
//-----------------------------------------------------------------------------
std::wstring ConvertToUTF16(const std::string & source, const UINT codePage)
{
    // Fail if an invalid input character is encountered
    static const DWORD conversionFlags = MB_ERR_INVALID_CHARS;

    // Require size for destination string
    const int utf16Length = ::MultiByteToWideChar(
        codePage,           // code page for the conversion
        conversionFlags,    // flags
        source.c_str(),     // source string
        source.length(),    // length (in chars) of source string
        NULL,               // unused - no conversion done in this step
        0                   // request size of destination buffer, in wchar_t's
        );
    if (utf16Length == 0) 
    {
        const DWORD error = ::GetLastError();
        throw StringConversionException(
            "MultiByteToWideChar() failed: Can't get length of destination UTF-16 string.",
            error);
    }

    // Allocate room for destination string
    std::wstring utf16Text;
    utf16Text.resize(utf16Length);

    // Convert to Unicode UTF-16
    if ( ! ::MultiByteToWideChar(
        codePage,           // code page for conversion
        0,                  // validation was done in previous call
        source.c_str(),     // source string
        source.length(),    // length (in chars) of source string
        &utf16Text[0],      // destination buffer
        utf16Text.length()  // size of destination buffer, in wchar_t's
        )) 
    {
        const DWORD error = ::GetLastError();
        throw StringConversionException(
            "MultiByteToWideChar() failed: Can't convert to UTF-16 string.",
            error);
    }

    return utf16Text;
}

//-----------------------------------------------------------------------------
// Test.
//-----------------------------------------------------------------------------
int main()
{
    // Error codes
    static const int exitOk = 0;
    static const int exitError = 1;

    try 
    {
        // Test input string:
        //
        // ą - LATIN SMALL LETTER A WITH OGONEK
        std::string inText("x - LATIN SMALL LETTER A WITH OGONEK");
        inText[0] = 0xB9;

        // ANSI Central European; Central European (Windows) code page
        static const UINT codePage = 1250;

        // Convert to Unicode UTF-16
        const std::wstring utf16Text = ConvertToUTF16(inText, codePage);

        // Verify conversion.
        //  ą - LATIN SMALL LETTER A WITH OGONEK
        //  --> Unicode UTF-16 0x0105
        // http://www.fileformat.info/info/unicode/char/105/index.htm
        if (utf16Text[0] != 0x0105) 
        {
            throw std::runtime_error("Wrong conversion.");
        }
        std::cout << "All right." << std::endl;
    }
    catch (const StringConversionException& e)
    {
        std::cerr << "*** ERROR:\n";
        std::cerr << e.what() << "\n";
        std::cerr << "Error code = " << e.Error();
        std::cerr << std::endl;
        return exitError;
    }
    catch (const std::exception& e)
    {
        std::cerr << "*** ERROR:\n";
        std::cerr << e.what();
        std::cerr << std::endl;
        return exitError;
    }
    return exitOk;
}

///////////////////////////////////////////////////////////////////////////////

c++ - ASCII から Unicode char コード (FreeType2) への変換

2 に答える 2

Related

Reference