c++ - Unicode、Boost、C ++、codecvtsで困惑

Question

C ++では、Unicodeを使用して処理を実行したいと思います。それで、Unicodeのうさぎの穴に落ちた後、私はなんとか混乱、頭痛、そして場所の列車の大破に終わった。

しかし、Boostでは、Unicodeファイルパスを使用しようとしたり、Unicode入力でBoostプログラムオプションライブラリを使用しようとしたりするという不幸な問題がありました。ロケール、codecvts、Unicodeエンコーディング、Boostのテーマで見つけたものは何でも読んだことがあります。

物事を機能させるための私の現在の試みは、UTF-8文字列を取得してプラットフォームのエンコーディング（POSIXではUTF-8、WindowsではUTF-16）に変換するcodecvtを用意することですが、これは避けようとしていwchar_tます。

私が実際に得た最も近い方法は、Boost.Localeを使用してこれを実行し、出力時にUTF-8文字列からUTF-32文字列に変換しようとすることです。

#include <string>
#include <boost/locale.hpp>
#include <locale>

int main(void)
{
  std::string data("Testing, 㤹");

  std::locale fromLoc = boost::locale::generator().generate("en_US.UTF-8");
  std::locale toLoc   = boost::locale::generator().generate("en_US.UTF-32");

  typedef std::codecvt<wchar_t, char, mbstate_t> cvtType;
  cvtType const* toCvt = &std::use_facet<cvtType>(toLoc);

  std::locale convLoc = std::locale(fromLoc, toCvt);

  std::cout.imbue(convLoc);
  std::cout << data << std::endl;

  // Output is unconverted -- what?

  return 0;
}

ワイド文字を使用して他の種類の変換を行っていたと思いますが、実際に何をしているのかわかりません。この時点で、その仕事に適したツールが何であるかはわかりません。ヘルプ？

score 11 · Accepted Answer

さて、長い数ヶ月後に私はそれを理解しました、そして私は将来人々を助けたいです。

まず第一に、codecvtのことはそれを行う間違った方法でした。Boost.Localeは、boost :: locale::conv名前空間内の文字セット間で変換する簡単な方法を提供します。これが1つの例です（ロケールに基づかない他の例もあります）。

#include <boost/locale.hpp>
namespace loc = boost::locale;

int main(void)
{
  loc::generator gen;
  std::locale blah = gen.generate("en_US.utf-32");

  std::string UTF8String = "Tésting!";
  // from_utf will also work with wide strings as it uses the character size
  // to detect the encoding.
  std::string converted = loc::conv::from_utf(UTF8String, blah);

  // Outputs a UTF-32 string.
  std::cout << converted << std::endl;

  return 0;
}

ご覧のとおり、「en_US.utf-32」を「」に置き換えると、ユーザーのロケールで出力されます。

std :: coutにこれを常に実行させる方法はまだわかりませんが、Boost.Localeのtranslate（）関数はユーザーのロケールで出力します。

UTF-8文字列クロスプラットフォームを使用するファイルシステムに関しては、それは可能であるように思われます。これを行う方法へのリンクは次のとおりです。

score 3 · Accepted Answer

  std::cout.imbue(convLoc);
  std::cout << data << std::endl;

~~これはno-opであるを使用するため、codecvt<char, char, mbstate_t>~~変換は行われません。codecvtを使用する唯一の標準ストリームはファイルストリームです。std :: coutは、変換を実行するためにまったく必要ありません。

Boost.FilesystemがWindowsでナローストリングをUTF-8として解釈するように強制するには、UTF-8↔UTF-16codecvtファセットをboost::filesystem::imbue持つロケールで使用します。Boost.Localeには後者の実装があります。

score 3 · Accepted Answer

Boostファイルシステムのiostream置換クラスは、VisualC++で使用するとUTF-16で正常に機能します。

ただし、Windowsのg ++で使用すると（少なくともBoostバージョン1.47の時点では）、（任意のファイル名をサポートするという意味で）機能しません。それを説明するコードコメントがあります。基本的に、Visual C ++標準ライブラリは、Boostファイルシステムクラスが使用する非標準wchar_tベースのコンストラクタを提供しますが、g++はこれらの拡張機能をサポートしていません。

回避策は8.3の短いファイル名を使用することですが、古いWindowsバージョンではユーザーが短いファイル名の自動生成をオフにできるため、この解決策は少し脆弱です。

WindowsでBoostファイルシステムを使用するためのサンプルコード：

#include "CmdLineArgs.h"        // CmdLineArgs
#include "throwx.h"             // throwX, hopefully
#include "string_conversions.h" // ansiOrFillerFrom( wstring )

#include <boost/filesystem/fstream.hpp>     // boost::filesystem::ifstream
#include <iostream>             // std::cout, std::cerr, std::endl
#include <stdexcept>            // std::runtime_error, std::exception
#include <string>               // std::string
#include <stdlib.h>             // EXIT_SUCCESS, EXIT_FAILURE
using namespace std;
namespace bfs = boost::filesystem;

inline string ansi( wstring const& ws ) { return ansiWithFillersFrom( ws ); }

int main()
{
    try
    {
        CmdLineArgs const   args;
        wstring const       programPath     = args.at( 0 );

        hopefully( args.nArgs() == 2 )
            || throwX( "Usage: " + ansi( programPath ) + " FILENAME" );

        wstring const       filePath        = args.at( 1 );
        bfs::ifstream       stream( filePath );     // Nice Boost ifstream subclass.
        hopefully( !stream.fail() )
            || throwX( "Failed to open file '" + ansi( filePath ) + "'" );

        string line;
        while( getline( stream, line ) )
        {
            cout << line << endl;
        }
        hopefully( stream.eof() )
            || throwX( "Failed to list contents of file '" + ansi( filePath ) + "'" );

        return EXIT_SUCCESS;
    }
    catch( exception const& x )
    {
        cerr << "!" << x.what() << endl;
    }
    return EXIT_FAILURE;
}

c++ - Unicode、Boost、C ++、codecvtsで困惑

3 に答える 3

Related

Reference