c++ - boost::spirit で引用符付き文字列を解析する

Question

一部の文字列が引用符で囲まれていない、「引用されている」、または「引用されている」可能性がある文を解析したいと思います。以下のコードはほとんど機能しますが、閉じ引用符に一致しません。これはqq参照のせいだと思います。変更はコード内でコメント化されます。変更により、"quoted" または "quoted" も解析され、元の問題が終了引用符にあることを示すのに役立ちます。コードには、正確な文法も記述されています。

完全に明確にするために：引用符で囲まれていない文字列が解析されます。のような引用符で囲まれた文字列'hello'は、開始引用符'、すべての文字 helloを解析しますが、最後の引用符の解析に失敗します'。

ブーストチュートリアルの開始/終了タグの一致に似た別の試みをしましたが、成功しませんでした。

template <typename Iterator>
struct test_parser : qi::grammar<Iterator, dectest::Test(), ascii::space_type>
{
    test_parser()
        :
    test_parser::base_type(test, "test")
    {
        using qi::fail;
        using qi::on_error;
        using qi::lit;
        using qi::lexeme;
        using ascii::char_;
        using qi::repeat;
        using namespace qi::labels;
        using boost::phoenix::construct;
        using boost::phoenix::at_c;
        using boost::phoenix::push_back;
        using boost::phoenix::val;
        using boost::phoenix::ref;
        using qi::space;

        char qq;          

        arrow = lit("->");

        open_quote = (char_('\'') | char_('"')) [ref(qq) = _1];  // Remember what the opening quote was
        close_quote = lit(val(qq));  // Close must match the open
        // close_quote = (char_('\'') | char_('"')); // Enable this line to get code 'almost' working

        quoted_string = 
            open_quote
            >> +ascii::alnum        
            >> close_quote; 

        unquoted_string %= +ascii::alnum;
        any_string %= (quoted_string | unquoted_string);

        test = 
            unquoted_string             [at_c<0>(_val) = _1] 
            > unquoted_string           [at_c<1>(_val) = _1]   
            > repeat(1,3)[any_string]   [at_c<2>(_val) = _1]
            > arrow
            > any_string                [at_c<3>(_val) = _1] 
            ;

        // .. <snip>set rule names
        on_error<fail>(/* <snip> */);
        // debug rules
    }

    qi::rule<Iterator> arrow;
    qi::rule<Iterator> open_quote;
    qi::rule<Iterator> close_quote;

    qi::rule<Iterator, std::string()> quoted_string;
    qi::rule<Iterator, std::string()> unquoted_string;
    qi::rule<Iterator, std::string()> any_string;     // A quoted or unquoted string

    qi::rule<Iterator, dectest::Test(), ascii::space_type> test;

};


// main()
// This example should fail at the very end 
// (ie not parse "str3' because of the mismatched quote
// However, it fails to parse the closing quote of str1
typedef boost::tuple<string, string, vector<string>, string> DataT;
DataT data;
std::string str("addx001 add 'str1'   \"str2\"       ->  \"str3'");
std::string::const_iterator iter = str.begin();
const std::string::const_iterator end = str.end();
bool r = phrase_parse(iter, end, grammar, boost::spirit::ascii::space, data);

ボーナスクレジット: ローカルデータメンバーを回避するソリューション (char qq上記の例など) が推奨されますが、実用的な観点から、機能するものは何でも使用します!

score 12 · Accepted Answer

への参照qqはコンストラクタを離れた後にぶら下がりになるので、それは確かに問題です。

qi::localsパーサー式内にローカル状態を保持する正規の方法です。他のオプションは、の寿命を延ばすことですqq（たとえば、文法クラスのメンバーにすることにより）。最後に、あなたも興味があるかもしれませんinherited attributes。このメカニズムにより、「パラメーター」を使用してルール/文法を呼び出す方法が提供されます (ローカル状態を渡します)。

注kleene 演算子の使用には注意点があります+。貪欲であり、文字列が予想される引用符で終了していない場合、解析は失敗します。

（オプションで/部分的に）引用符で囲まれた文字列内の任意のコンテンツを処理するより完全な例については、私が書いた別の回答を参照してください。これにより、引用符で囲まれた文字列内の引用符をエスケープできます。

私の分割を実際の1行でのみ機能させ、文字列の引用部分をスキップできるようにするにはどうすればよいですか?

文法を関連するビットに減らし、いくつかのテストケースを含めました。

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>

namespace qi = boost::spirit::qi;

template <typename Iterator>
struct test_parser : qi::grammar<Iterator, std::string(), qi::space_type, qi::locals<char> >
{
    test_parser() : test_parser::base_type(any_string, "test")
    {
        using namespace qi;

        quoted_string = 
               omit    [ char_("'\"") [_a =_1] ]             
            >> no_skip [ *(char_ - char_(_a))  ]
            >> lit(_a)
        ; 

        any_string = quoted_string | +qi::alnum;
    }

    qi::rule<Iterator, std::string(), qi::space_type, qi::locals<char> > quoted_string, any_string;
};

int main()
{
    test_parser<std::string::const_iterator> grammar;
    const char* strs[] = { "\"str1\"", 
                           "'str2'",
                           "'str3' trailing ok",
                           "'st\"r4' embedded also ok",
                           "str5",
                           "str6'",
                           NULL };

    for (const char** it = strs; *it; ++it)
    {
        const std::string str(*it);
        std::string::const_iterator iter = str.begin();
        std::string::const_iterator end  = str.end();

        std::string data;
        bool r = phrase_parse(iter, end, grammar, qi::space, data);

        if (r)
            std::cout << "Parsed:    " << str << " --> " << data << "\n";
        if (iter!=end)
            std::cout << "Remaining: " << std::string(iter,end) << "\n";
    }
}

出力：

Parsed:    "str1" --> str1
Parsed:    'str2' --> str2
Parsed:    'str3' trailing ok --> str3
Remaining: trailing ok
Parsed:    'st"r4' embedded also ok --> st"r4
Remaining: embedded also ok
Parsed:    str5 --> str5
Parsed:    str6' --> str6
Remaining: '

c++ - boost::spirit で引用符付き文字列を解析する

1 に答える 1

Related

Reference