c++ - Unicode UTF-8 ファイルを wstring に読み込む

Question

wstringWindows プラットフォームで Unicode (UTF-8) ファイルを (s) に読み込むにはどうすればよいですか?

score 40 · Accepted Answer

With C++11 support, you can use std::codecvt_utf8 facet which encapsulates conversion between a UTF-8 encoded byte string and UCS2 or UCS4 character string and which can be used to read and write UTF-8 files, both text and binary.

In order to use facet you usually create locale object that encapsulates culture-specific information as a set of facets that collectively define a specific localized environment. Once you have a locale object, you can imbue your stream buffer with it:

#include <sstream>
#include <fstream>
#include <codecvt>

std::wstring readFile(const char* filename)
{
    std::wifstream wif(filename);
    wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
    std::wstringstream wss;
    wss << wif.rdbuf();
    return wss.str();
}

which can be used like this:

std::wstring wstr = readFile("a.txt");

Alternatively you can set the global C++ locale before you work with string streams which causes all future calls to the std::locale default constructor to return a copy of the global C++ locale (you don't need to explicitly imbue stream buffers with it then):

std::locale::global(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));

score 15 · Accepted Answer

@Hans Passant のコメントによると、最も簡単な方法は_wfopen_sを使用することです。モードでファイルを開きますrt, ccs=UTF-8。

少なくとも VC++ 2010 で動作する別の純粋な C++ ソリューションを次に示します。

#include <locale>
#include <codecvt>
#include <string>
#include <fstream>
#include <cstdlib>

int main() {
    const std::locale empty_locale = std::locale::empty();
    typedef std::codecvt_utf8<wchar_t> converter_type;
    const converter_type* converter = new converter_type;
    const std::locale utf8_locale = std::locale(empty_locale, converter);
    std::wifstream stream(L"test.txt");
    stream.imbue(utf8_locale);
    std::wstring line;
    std::getline(stream, line);
    std::system("pause");
}

locale::empty()(ここでもlocale::global()機能する可能性があります) とコンストラクターのwchar_t*オーバーロードを除いてbasic_ifstream、これはかなり標準に準拠している必要があります (「標準」はもちろん C++0x を意味します)。

score 7 · Accepted Answer

Windows のみのプラットフォーム固有の関数を次に示します。

size_t GetSizeOfFile(const std::wstring& path)
{
    struct _stat fileinfo;
    _wstat(path.c_str(), &fileinfo);
    return fileinfo.st_size;
}

std::wstring LoadUtf8FileToString(const std::wstring& filename)
{
    std::wstring buffer;            // stores file contents
    FILE* f = _wfopen(filename.c_str(), L"rtS, ccs=UTF-8");

    // Failed to open file
    if (f == NULL)
    {
        // ...handle some error...
        return buffer;
    }

    size_t filesize = GetSizeOfFile(filename);

    // Read entire file contents in to memory
    if (filesize > 0)
    {
        buffer.resize(filesize);
        size_t wchars_read = fread(&(buffer.front()), sizeof(wchar_t), filesize, f);
        buffer.resize(wchars_read);
        buffer.shrink_to_fit();
    }

    fclose(f);

    return buffer;
}

次のように使用します。

std::wstring mytext = LoadUtf8FileToString(L"C:\\MyUtf8File.txt");

ファイル全体がメモリに読み込まれるため、非常に大きなファイルには使用しないでください。

score 0 · Accepted Answer

この質問は、C++ の std::wstring、UTF-16、UTF-8、および Windows GUI での文字列の表示について混乱していますで対処されました。つまり、wstring は UTF-16 の前身である UCS-2 標準に基づいています。これは厳密に 2 バイトの標準です。これはアラビア語をカバーしていると思います。

score -6 · Accepted Answer

これは少し生ですが、ファイルを単純な古いバイトとして読み取り、バイトバッファーを wchar_t* にキャストするのはどうですか?

何かのようなもの：

#include <iostream>
#include <fstream>
std::wstring ReadFileIntoWstring(const std::wstring& filepath)
{
    std::wstring wstr;
    std::ifstream file (filepath.c_str(), std::ios::in|std::ios::binary|std::ios::ate);
    size_t size = (size_t)file.tellg();
    file.seekg (0, std::ios::beg);
    char* buffer = new char [size];
    file.read (buffer, size);
    wstr = (wchar_t*)buffer;
    file.close();
    delete[] buffer;
    return wstr;
}

c++ - Unicode UTF-8 ファイルを wstring に読み込む

6 に答える 6

Related

Reference