c++ - Boost::Regex は、長い式が一致しない場合にエラーをスローします

Question

私は2つの正規表現を持っています。1 つは Python スタイルのコメントに一致し、もう 1 つはファイルパスに一致します。

コメントがファイルパス式と一致するかどうかを確認しようとすると、コメント文字列が 15 文字を超えるとエラーがスローされます。それ以外の場合は、期待どおりに動作します。

この問題が発生しないように正規表現を変更するにはどうすればよいですか

サンプルコード:

#include <string>
#include "boost/regex.hpp"

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
    boost::regex re_comment("\\s*#[^\\r\\n]*");
    boost::regex re_path("\"?([A-Za-z]:)?[\\\\/]?(([^(\\\\/:*?\"<>|\\r\\n)]+[\\\\/]?)+)?\\.[\\w]+\"?");

    string shortComment = " #comment ";
    string longComment  = "#123456789012345678901234567890";
    string myPath       = "C:/this/is.a/path.doc";

    regex_match(shortComment,re_comment);    //evaluates to true
    regex_match(longComment,re_comment);     //evaluates to true

    regex_match(myPath, re_path);             //evaluates to true
    regex_match(shortComment, re_path);       //evaluates to false
    regex.match(longComment, re_path);        //throws error
}

これはスローされるエラーです

terminate called after throwing an instance of
    'boost::exception_detail::clone_impl<boost::exception_detail
            ::error_info_injector<std::runtime_error> >'
what():  The complexity of matching the regular expression exceeded predefined
    bounds.  Try refactoring the regular expression to make each choice made by the
    state machine unambiguous.  This exception is thrown to prevent "eternal" matches
    that take  an indefinite period time to locate.

score 1 · Accepted Answer

世界の問題のすべてを解決するために、常に 1 つの巨大な正規表現を作成したくなることはわかっています。実際、そうすることにはパフォーマンス上の理由があるかもしれません。そうは言っても、問題を扱いやすい部分に分解することを提案します。

基本的に、引用符を処理し、文字列を dir セパレーターで分割し、パスの各部分を正規表現します。

#include <string>
#include "boost/regex.hpp"
#include "boost/algorithm/string.hpp"
using namespace std;
using namespace boost;


bool my_path_match(std::string line)
{
    bool ret = true;
    string drive = "([a-zA-Z]\\:)?";
    string pathElem = "(\\w|\\.|\\s)+";
    boost::regex re_pathElem(pathElem);
    boost::regex re_drive("(" + drive + "|" + pathElem + ")");

    vector<string> split_line;
    vector<string>::iterator it;

    if ((line.front() == '"') && (line.back() == '"'))
    {
        line.erase(0, 1); // erase the first character
        line.erase(line.size() - 1); // erase the last character
    }

    split(split_line, line, is_any_of("/\\"));

    if (regex_match(split_line[0], re_drive) == false)
    {
        ret = false;
    }
    else
    {
        for (it = (split_line.begin() + 1); it != split_line.end(); it++)
        {
            if (regex_match(*it, re_pathElem) == false)
            {
                ret = false;
                break;
            }
        }
    }
    return ret;
}

int main(int argc, char** argv)
{
    boost::regex re_comment("^.*#.*$");

    string shortComment = " #comment ";
    string longComment  = "#123456789012345678901234567890";
    vector<string> testpaths;
    vector<string> paths;
    vector<string>::iterator it;
    testpaths.push_back("C:/this/is.a/path.doc");
    testpaths.push_back("C:/this/is also .a/path.doc");
    testpaths.push_back("/this/is also .a/path.doc");
    testpaths.push_back("./this/is also .a/path.doc");
    testpaths.push_back("this/is also .a/path.doc");
    testpaths.push_back("this/is 1 /path.doc");

    bool ret;
    ret = regex_match(shortComment, re_comment);    //evaluates to true
    cout<<"should be true = "<<ret<<endl;
    ret = regex_match(longComment, re_comment);     //evaluates to true
    cout<<"should be true = "<<ret<<endl;

    string quotes;
    for (it = testpaths.begin(); it != testpaths.end(); it++)
    {
        paths.push_back(*it);
        quotes = "\"" + *it + "\""; // test quoted paths
        paths.push_back(quotes);
        std::replace(it->begin(), it->end(), '/', '\\'); // test backslash paths
        std::replace(quotes.begin(), quotes.end(), '/', '\\'); // test backslash quoted paths
        paths.push_back(*it);
        paths.push_back(quotes);
    }

    for (it = paths.begin(); it != paths.end(); it++)
    {
        ret = my_path_match(*it);             //evaluates to true
        cout<<"should be true = "<<ret<<"\t"<<*it<<endl;
    }

    ret = my_path_match(shortComment);       //evaluates to false
    cout<<"should be false = "<<ret<<endl;
    ret = my_path_match(longComment);        //evaluates to false
    cout<<"should be false = "<<ret<<endl;
}

はい、(おそらく) 単一の正規表現よりも遅くなりますが、動作します。Python のコメント行でエラーがスローされることはありません。間違っているので修正します (つまり、保守可能です)。

c++ - Boost::Regex は、長い式が一致しない場合にエラーをスローします

1 に答える 1

Related

Reference