c++ - 文字列をトークン化し、C++ で区切り記号を含める

Question

次のトークンを使用していますが、区切り文字を含める方法がわかりません。

void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{

    int startpos = 0;
    int pos = str.find_first_of(delimiters, startpos);
    string strTemp;


    while (string::npos != pos || string::npos != startpos)
    {

        strTemp = str.substr(startpos, pos - startpos);
        tokens.push_back(strTemp.substr(0, strTemp.length()));

        startpos = str.find_first_not_of(delimiters, pos);
        pos = str.find_first_of(delimiters, startpos);

    }
}

score 17 · Accepted Answer

C ++文字列ツールキットライブラリ（StrTk）には、次のソリューションがあります。

std::string str = "abc,123 xyz";
std::vector<std::string> token_list;
strtk::split(";., ",
             str,
             strtk::range_to_type_back_inserter(token_list),
             strtk::include_delimiters);

その結果、token_listには次の要素が含まれるはずです。

トークン₀ ="abc、"
トークン₁ ="123"
トークン₂ ="xyz"

その他の例はここにあります

score 4 · Accepted Answer

私は今、これは少しずさんですが、これが私が最終的に得たものです。これは学校の課題であり、インストラクターは find_first_of を使用してこれを達成することを望んでいたため、boost を使用したくありませんでした。

みんなの助けに感謝します。

vector<string> Tokenize(const string& strInput, const string& strDelims)
{
 vector<string> vS;

 string strOne = strInput;
 string delimiters = strDelims;

 int startpos = 0;
 int pos = strOne.find_first_of(delimiters, startpos);

 while (string::npos != pos || string::npos != startpos)
 {
  if(strOne.substr(startpos, pos - startpos) != "")
   vS.push_back(strOne.substr(startpos, pos - startpos));

  // if delimiter is a new line (\n) then addt new line
  if(strOne.substr(pos, 1) == "\n")
   vS.push_back("\\n");
  // else if the delimiter is not a space
  else if (strOne.substr(pos, 1) != " ")
   vS.push_back(strOne.substr(pos, 1));

  if( string::npos == strOne.find_first_not_of(delimiters, pos) )
   startpos = strOne.find_first_not_of(delimiters, pos);
  else
   startpos = pos + 1;

        pos = strOne.find_first_of(delimiters, startpos);

 }

 return vS;
}

score 2 · Accepted Answer

2

区切り文字が文字列ではなく文字である場合は、strtokを使用できます。

于 2009-10-02T20:17:16.843 に答える

score 2 · Accepted Answer

私は本当にあなたのコードをたどることができません.動作するプログラムを投稿できますか?

とにかく、これは単純なトークナイザーであり、エッジケースをテストしていません。

#include <iostream>
#include <string>
#include <vector>

using namespace std;

void tokenize(vector<string>& tokens, const string& text, const string& del)
{
    string::size_type startpos = 0,
        currentpos = text.find(del, startpos);

    do
    {
        tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));

        startpos = currentpos + del.size();
        currentpos = text.find(del, startpos);
    } while(currentpos != string::npos);

    tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
}

入力例、区切り記号 = $$:

Hello$$Stack$$Over$$$Flow$$$$!

トークン:

Hello$$
Stack$$
Over$$
$Flow$$
$$
!

注: テストせずに書いたトークナイザーは絶対に使用しません。boost::tokenizerを使用してください。

score 0 · Accepted Answer

それは、前の区切り文字、次の区切り文字、またはその両方が必要かどうか、および前後に区切り文字がない可能性のある文字列の先頭と末尾にある文字列をどうしたいかによって異なります。

前後の区切り文字を含む各単語が必要であると仮定しますが、区切り文字の文字列自体は必要ありません (たとえば、最後の文字列の後に区切り文字がある場合)。

template <class iter>
void tokenize(std::string const &str, std::string const &delims, iter out) { 
    int pos = 0;
    do { 
        int beg_word = str.find_first_not_of(delims, pos);
        if (beg_word == std::string::npos) 
            break;
        int end_word = str.find_first_of(delims, beg_word);
        int beg_next_word = str.find_first_not_of(delims, end_word);
        *out++ = std::string(str, pos, beg_next_word-pos);
        pos = end_word;
    } while (pos != std::string::npos);
}

今のところ、常にコレクションにプッシュしていると仮定するのではなく、イテレータを出力に使用して、より STL アルゴリズムのように記述しました。入力が文字列であることに（今のところ）依存しているため、入力にイテレータを使用しません。

c++ - 文字列をトークン化し、C++ で区切り記号を含める

5 に答える 5

Related

Reference