c++ - シングルパスの検索と置換

Question

テキスト内でシングルパス検索と置換を行う方法を知っている人はいますか? 私は、すべてのマイクロ最適化が重要な高性能プログラムに取り組んでいます。以下は、私が現在行っていることを示す例です。

#include <iostream>
#include <string>

/*!
\brief Replaces all common character escape sequences with text representations
\note Some character seqences messes up the trace output and must be replaces
* with text represantions, ie. the newline character will the replaced with "\n"
* etc.
\returns The formatted string
*/
std::wstring ReplaceAll(std::wstring &str)
{
    SearchAndReplace(str, L"\a", L"\\a");   // control-g, C-g
    SearchAndReplace(str, L"\b", L"\\b");   // backspace, <BS>, C-h
    SearchAndReplace(str, L"\t", L"\\t");   // tab, <TAB>, C-i
    SearchAndReplace(str, L"\n", L"\\n");   // newline, C-j
    SearchAndReplace(str, L"\v", L"\\v");   // vertical tab, C-k
    SearchAndReplace(str, L"\f", L"\\f");   // formfeed character, C-l
    SearchAndReplace(str, L"\r", L"\\r");   // carriage return, <RET>, C-m
    return str;
}

/*!
\brief Wide string search and replace
\param str [in, out] String to search & replace
\param oldStr [in] Old string
\param newStr [in] New string
*/
std::wstring SearchAndReplace(std::wstring &str, const wchar_t *oldStr, const wchar_t *newStr) const
{
    size_t oldStrLen = wcslen(oldStr);
    size_t newStrLen = wcslen(newStr);
    size_t pos = 0;

    while((pos = str.find(oldStr, pos)) != string::npos)
    {
        str.replace(pos, oldStrLen, newStr);
        pos += newStrLen;
    }

    return str;
}

int main()
{
    std::wstring myStr(L"\tThe quick brown fox jumps over the lazy dog.\n\tThe quick brown fox jumps over the lazy dog\n\n");
    std::wcout << L"Before replace: " << myStr;
    std::wcout << L"After replace: " << ReplaceAll(myStr);
    return 0;
}

上記のコードは、同じ文字列を複数回通過する必要があるため、明らかに非効率的です。単一パスの検索および置換機能は、置換する文字のさまざまな配列を処理できるように非常に柔軟でなければなりません (つまり、にリストされているエスケープ文字だけではありませんReplaceAll())。

score 2 · Accepted Answer

For the task at hand you don't need any complicated algorithm! First of all, the "strings" you are searching for to be replaced are actually characters and distinct ones that (the more complicated algorithms mentioned in another reply are to deal with list of strings matched against a sequence). Moreover, your main problem is that you keep resizing your sequence all the time. You can't do the replacements in-place anyway as the string will grow with each replacement. A fairly simple approach should have a lot better performance than your current approach and, as far as I can tell, you are very far away from starting micro optimizations - you need to get your code do things roughly the right way first. For example, I would try something a long these lines:

struct match_first
{
    wchar_t d_c;
    match_first(wchar_t c): d_c(c) {}
    template <typename P>
    bool operator()(P const& p) const { return p.first == this->d_c; }
};

void Replace(std::wstring& value)
{
    std::wstring result;
    result.reserve(value.size());
    std::wstring special(L"\a\b\f\n\r\t\v");
    std::pair<wchar_t, std::wstring> const replacements[] = {
        std::pair<wchar_t, std::wstring>(L'\a', L"\\a"),
        std::pair<wchar_t, std::wstring>(L'\b', L"\\b"),
        std::pair<wchar_t, std::wstring>(L'\f', L"\\f"),
        std::pair<wchar_t, std::wstring>(L'\n', L"\\n"),
        std::pair<wchar_t, std::wstring>(L'\r', L"\\r"),
        std::pair<wchar_t, std::wstring>(L'\t', L"\\t"),
        std::pair<wchar_t, std::wstring>(L'\v', L"\\v")
    };

    std::wstring::size_type cur(0);
    for (std::wstring::size_type found(cur);
         std::wstring::npos != (found = value.find_first_of(special, cur));
         cur = found + 1) {
        result.insert(result.end(),
                      value.begin() + cur, value.begin() + found);

        std::pair<wchar_t, std::wstring> const* replacement
            = std::find_if(std::begin(replacements), std::end(replacements),
                           match_first(value[found]));
        result.insert(result.end(),
                      replacement->second.begin(), replacement->second.end());
    }
    result.insert(result.end(), value.begin() + cur, value.end());

    value.swap(result);
}

The idea of the algorithm is to have a single pass over the source string, finding all strings which need to be replaced and if one is found copying the section of not to be replaced characters and the replacement string to a new string being build up. There are few things which can be made a bit faster with some effort but this one moves each character just once rather than the original code which keeps shuffling the tail of characters not being looked at one character forward with each found character to be replaced.

c++ - シングルパスの検索と置換

5 に答える 5

Related

Reference