c++ - 文字列を文字で分割する

Question

これは非常に簡単な問題であることはわかっていますが、自分で解決したいだけです

文字を分割区切り文字として使用して、文字列を配列に分割したいだけです。(C# の有名な.Split()関数によく似ています。もちろん、ブルートフォースアプローチを適用することもできますが、それよりも優れた方法があるのではないかと思います。

これまでに検索したところ、おそらく最も近い解決策はstrtok()の使用ですが、不便(文字列を char 配列に変換するなど)のため、使用するのは好きではありません。これを実装する簡単な方法はありますか?

注:これを強調したかったのは、「ブルートフォースが機能しないのはなぜですか」という質問があるからです。私の強引な解決策は、ループを作成し、内部でsubstr()関数を使用することでした。しかし、開始点と長さが必要なので、日付を分割したいときに失敗します。ユーザーは 2012 年 7 月 12 日または 2011 年 7 月 3 日と入力する可能性があるため、'/' 区切り記号の次の位置を計算する前に実際の長さを知ることができます。

score 125 · Accepted Answer

ベクトル、文字列、および文字列ストリームを使用します。少し面倒ですが、うまくいきます。

#include <string>
#include <vector>
#include <sstream>

std::stringstream test("this_is_a_test_string");
std::string segment;
std::vector<std::string> seglist;

while(std::getline(test, segment, '_'))
{
   seglist.push_back(segment);
}

と同じ内容のベクトルになります。

std::vector<std::string> seglist{ "this", "is", "a", "test", "string" };

score 24 · Accepted Answer

Boost には、探しているsplit()algorithm/string.hppがあります:

std::string sample = "07/3/2011";
std::vector<std::string> strs;
boost::split(strs, sample, boost::is_any_of("/"));

score 16 · Accepted Answer

RegEx が好きな人のための別の方法 (C++11/boost)。個人的には、この種のデータに対する RegEx の大ファンです。IMOは、必要に応じて「有効な」データを構成するものについてより賢く選択できるため、区切り文字を使用して単純に文字列を分割するよりもはるかに強力です。

#include <string>
#include <algorithm>    // copy
#include <iterator>     // back_inserter
#include <regex>        // regex, sregex_token_iterator
#include <vector>

int main()
{
    std::string str = "08/04/2012";
    std::vector<std::string> tokens;
    std::regex re("\\d+");

    //start/end points of tokens in str
    std::sregex_token_iterator
        begin(str.begin(), str.end(), re),
        end;

    std::copy(begin, end, std::back_inserter(tokens));
}

score 4 · Accepted Answer

ctypeもう 1 つの可能性は、特別なファセットを使用するロケールをストリームに吹き込むことです。ストリームは ctype ファセットを使用して、セパレータとして扱う「空白」を判別します。区切り文字を空白として分類する ctype ファセットを使用すると、読み取りは非常に簡単になります。ファセットを実装する 1 つの方法を次に示します。

struct field_reader: std::ctype<char> {

    field_reader(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(table_size, std::ctype_base::mask());

        // we'll assume dates are either a/b/c or a-b-c:
        rc['/'] = std::ctype_base::space;
        rc['-'] = std::ctype_base::space;
        return &rc[0];
    }
};

それを使用してimbue、それを含むロケールを使用するようにストリームに指示し、そのストリームからデータを読み取ります。

std::istringstream in("07/3/2011");
in.imbue(std::locale(std::locale(), new field_reader);

それが整っていれば、分割はほとんど自明です - いくつかのを使用してベクトルを初期化istream_iteratorし、文字列から部分を読み取るだけです (これはに埋め込まれていますistringstream):

std::vector<std::string>((std::istream_iterator<std::string>(in),
                          std::istream_iterator<std::string>());

明らかに、これを 1 か所でしか使用しない場合、やり過ぎになる傾向があります。ただし、これを頻繁に使用すると、コードの残りの部分をきれいに保つのに大いに役立ちます。

score 4 · Accepted Answer

stringstream理由はわかりませんが、本質的に嫌いです。今日、私はこの関数を書いてstd::string、任意の文字または文字列で a をベクトルに分割できるようにしました。この質問は古いことは知っていますが、別の分割方法を共有したいと思いましたstd::string。

このコードは、分割した文字列の一部を結果から完全に除外しますが、それらを含めるように簡単に変更できます。

#include <string>
#include <vector>

void split(std::string str, std::string splitBy, std::vector<std::string>& tokens)
{
    /* Store the original string in the array, so we can loop the rest
     * of the algorithm. */
    tokens.push_back(str);

    // Store the split index in a 'size_t' (unsigned integer) type.
    size_t splitAt;
    // Store the size of what we're splicing out.
    size_t splitLen = splitBy.size();
    // Create a string for temporarily storing the fragment we're processing.
    std::string frag;
    // Loop infinitely - break is internal.
    while(true)
    {
        /* Store the last string in the vector, which is the only logical
         * candidate for processing. */
        frag = tokens.back();
        /* The index where the split is. */
        splitAt = frag.find(splitBy);
        // If we didn't find a new split point...
        if(splitAt == std::string::npos)
        {
            // Break the loop and (implicitly) return.
            break;
        }
        /* Put everything from the left side of the split where the string
         * being processed used to be. */
        tokens.back() = frag.substr(0, splitAt);
        /* Push everything from the right side of the split to the next empty
         * index in the vector. */
        tokens.push_back(frag.substr(splitAt+splitLen, frag.size()-(splitAt+splitLen)));
    }
}

使用するには、次のように呼び出すだけです...

std::string foo = "This is some string I want to split by spaces.";
std::vector<std::string> results;
split(foo, " ", results);

ベクター内のすべての結果に自由にアクセスできるようになりました。そのように簡単です-いいえstringstream、サードパーティのライブラリも、Cに戻ることもありません!

score 2 · Accepted Answer

boost::tokenizerを見てみましょう

std::string::find()独自のメソッドをロールアップする場合は、を使用して分割ポイントを決定できます。

score 0 · Accepted Answer

持っていない (欲しい、必要としている) 人にとっては、C++20このC++11解決策が選択肢になるかもしれません。

これは出力反復子でテンプレート化されるため、分割項目を追加する独自の宛先を指定でき、複数の連続する区切り文字を処理する方法を選択できます。

はい、使用しますstd::regexが、既に C++11 の幸せな土地にいる場合は、使用しないでください。

////////////////////////////////////////////////////////////////////////////
//
// Split string "s" into substrings delimited by the character "sep"
// skip_empty indicates what to do with multiple consecutive separation
// characters:
//
// Given s="aap,,noot,,,mies"
//       sep=','
//
// then output gets the following written into it:
//      skip_empty=true  => "aap" "noot" "mies"
//      skip_empty=false => "aap" "" "noot" "" "" "mies"
//
////////////////////////////////////////////////////////////////////////////
template <typename OutputIterator>
void string_split(std::string const& s, char sep, OutputIterator output, bool skip_empty=true) {
    std::regex  rxSplit( std::string("\\")+sep+(skip_empty ? "+" : "") );

    std::copy(std::sregex_token_iterator(std::begin(s), std::end(s), rxSplit, -1),
              std::sregex_token_iterator(), output);
}

score 0 · Accepted Answer

stringa を文字配列 ( )に変換したくない理由はありますchar*か? 呼び出すのはかなり簡単.c_str()です。ループと.find()関数を使用することもできます。

文字列クラス
 文字列 .find()
文字列 .c_str()

c++ - 文字列を文字で分割する

12 に答える 12

Related

Reference