c++ - VC++で正規表現によって文字列を分割する

Question

プロジェクトでVC++10を使用しています。C / C ++を初めて使用するので、Googleで検索しましたが、標準のC++には正規表現がないようです。VC++10には正規表現があるようです。ただし、正規表現の分割を行うにはどうすればよいですか？そのためだけにブーストが必要ですか？

Webを検索すると、多くの人がBoostを多くのこと、文字列のトークン化/分割、解析（PEG）、さらには正規表現（これは組み込まれているはずですが...）に推奨していることがわかりました。ブーストは必須だと結論付けることはできますか？些細なことのための180MBは、多くの言語で素朴にサポートされていますか？

score 7 · Accepted Answer

C++11 標準にはstd::regex. にも含まれていTR1 for Visual Studio 2010ます。実際、TR1 は VS2008 以降で利用可能で、std::tr1名前空間の下に隠されています。したがって、VS2008 以降では Boost.Regex は必要ありません。

分割は次を使用して実行できますregex_token_iterator。

#include <iostream>
#include <string>
#include <regex>

const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separator("-");
const std::tr1::sregex_token_iterator endOfSequence;

std::tr1::sregex_token_iterator token(s.begin(), s.end(), separator, -1);
while(token != endOfSequence) 
{
   std::cout << *token++ << std::endl;
}

セパレータ自体も取得する必要がある場合は、がsub_match指すオブジェクトから取得できtokenます。これは、トークンの開始イテレータと終了イテレータを含むペアです。

while(token != endOfSequence) 
{
   const std::tr1::sregex_token_iterator::value_type& subMatch = *token;
   if(subMatch.first != s.begin())
   {
      const char sep = *(subMatch.first - 1);
      std::cout << "Separator: " << sep << std::endl;
   }

   std::cout << *token++ << std::endl;
}

これは、単一の文字区切りがある場合のサンプルです。セパレーター自体が任意の部分文字列になる可能性がある場合は、より複雑なイテレーター作業を実行し、前のトークンサブマッチオブジェクトを保存する必要があります。

または、正規表現グループを使用して、セパレータを最初のグループに配置し、実際のトークンを 2 番目のグループに配置することもできます。

const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separatorAndStr("(-*)([^-]*)");
const std::tr1::sregex_token_iterator endOfSequence;

// Separators will be 0th, 2th, 4th... tokens 
// Real tokens will be 1th, 3th, 5th... tokens 
int subMatches[] = { 1, 2 };
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separatorAndStr, subMatches);
while(token != endOfSequence) 
{
   std::cout << *token++ << std::endl;
}

100% 正しいとは限りませんが、アイデアを説明するためだけに。

score 0 · Accepted Answer

このブログの例です。

あなたはすべての試合を持っていますres

std::tr1::cmatch res;
str = "<h2>Egg prices</h2>";
std::tr1::regex rx("<h(.)>([^<]+)");
std::tr1::regex_search(str.c_str(), res, rx);
std::cout << res[1] << ". " << res[2] << "\n";

c++ - VC++で正規表現によって文字列を分割する

2 に答える 2

Related

Reference