c++ - ブースト分割で escaped_list_separator を使用する

Question

私はブースト文字列ライブラリをいじっていて、分割メソッドの驚くほど単純さに出くわしました.

  string delimiters = ",";
  string str = "string, with, comma, delimited, tokens, \"and delimiters, inside a quote\"";
  // If we didn't care about delimiter characters within a quoted section we could us
  vector<string> tokens;  
  boost::split(tokens, str, boost::is_any_of(delimiters));
  // gives the wrong result: tokens = {"string", " with", " comma", " delimited", " tokens", "\"and delimiters", " inside a quote\""}

これは素晴らしく簡潔です...しかし、引用符では機能しないようで、代わりに次のようなことをしなければなりません

string delimiters = ",";
string str = "string, with, comma, delimited, tokens, \"and delimiters, inside a quote\"";
vector<string> tokens; 
escaped_list_separator<char> separator("\\",delimiters, "\"");
typedef tokenizer<escaped_list_separator<char> > Tokeniser;
Tokeniser t(str, separator);
for (Tokeniser::iterator it = t.begin(); it != t.end(); ++it)
    tokens.push_back(*it);
// gives the correct result: tokens = {"string", " with", " comma", " delimited", " tokens", "\"and delimiters, inside a quote\""}

私の質問は、区切り文字を引用しているときに、分割または別の標準アルゴリズムを使用できますか? パープルドッグのおかげで、私はすでに望ましい結果を達成するための非推奨の方法を持っています.さらに別の方法。

編集:結果を表示し、質問を明確にするためにコードを更新しました。

score 5 · Accepted Answer

boost::split メソッドを使用してこれを行う簡単な方法はないようです。これを行うために私が見つけることができる最短のコードは

vector<string> tokens; 
tokenizer<escaped_list_separator<char> > t(str, escaped_list_separator<char>("\\", ",", "\""));
BOOST_FOREACH(string s, escTokeniser)
    tokens.push_back(s);

これは、元のスニペットよりもわずかに冗長です

vector<string> tokens;  
boost::split(tokens, str, boost::is_any_of(","));

score 2 · Accepted Answer

これにより、明示的なループなしでジェイミー・クックの答えと同じ結果が得られます。

tokenizer<escaped_list_separator<char> >tok(str);
vector<string> tokens( tok.begin(), tok.end() );

トークナイザーコンストラクターの 2 番目のパラメーターは、既定でに設定されescaped_list_separator<char>("\\", ",", "\"")ているため、必要ありません。コンマまたは引用符の要件が異なる場合を除きます。

score 1 · Accepted Answer

boost :: stringライブラリについてはわかりませんが、boost regex_token_iteratorを使用すると、正規表現で区切り文字を表現できます。そうです、引用符で囲まれた区切り文字や、はるかに複雑なものを使用することもできます。

これは、以前は非推奨となったregex_splitで行われていたことに注意してください。

ブーストドキュメントから抜粋した例を次に示します。

#include <iostream>
#include <boost/regex.hpp>

using namespace std;

int main(int argc)
{
   string s;
   do{
      if(argc == 1)
      {
         cout << "Enter text to split (or \"quit\" to exit): ";
         getline(cin, s);
         if(s == "quit") break;
      }
      else
         s = "This is a string of tokens";

      boost::regex re("\\s+");
      boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
      boost::sregex_token_iterator j;

      unsigned count = 0;
      while(i != j)
      {
         cout << *i++ << endl;
         count++;
      }
      cout << "There were " << count << " tokens found." << endl;

   }while(argc == 1);
   return 0;
}

プログラムがhelloworldを引数として開始された場合、出力は次のようになります。

hello
world
There were 2 tokens found.

boost :: regex re（ "\ s +"）;を変更します。boost :: regex re（ "\"、\ ""）;に引用符で囲まれた区切り文字を分割します。プログラムをhello"、" world as引数で開始すると、次のようになります。

hello
world
There were 2 tokens found.

しかし、私はあなたがそのようなものに対処したいと思うと思います："hello"、 "world"、その場合の1つの解決策は次のとおりです。

コマのみで分割
次に、「」を削除します（おそらく、boost /アルゴリズム/文字列/trim.hppまたは正規表現ライブラリを使用します）。

編集：プログラム出力を追加

c++ - ブースト分割で escaped_list_separator を使用する

3 に答える 3

Related

Reference