compiler-construction - bison と flex でインデントをブロック区切りとして使用する方法

Question

bison + flex でブロック区切り文字としてインデントを実装する方法を知りました。ちょうどパイソンのように。私は独自のプログラミング言語を書いています (主に楽しみのためですが、ゲームエンジンと一緒に使用するつもりです)。

私はすでに C でコンパイラ (実際には「langToy」から Nasm へのトランスレータ) を作成しましたが、失敗しました。なんらかの理由で、ソースファイル全体で 1 つの文字列しか処理できませんでした (まあ、私は 48 時間以上起きていたので... ご存知のように、脳のメルトダウンです)。

中括弧および/または begin -> end の実装が簡単なのか (それを行うのに問題はありません)、それともロックアップするのは私の脳だけなのかはわかりません。

前もって感謝します！

更新:わかりました、フレックスでそれを行う方法についての手がかりがありません。複数の DEDENT をパーサーに返すのに問題があります。Flex/Bison は私にとって比較的新しいものです。

更新 2: これは、これまでに思いついたフレックスファイルです。よくわかりません：

%x t
%option noyywrap

%{
  int lineno = 0, ntab = 0, ltab = 0, dedent = 0;
%}

%%

<*>\n  { ntab = 0; BEGIN(t); }
<t>\t  { ++ntab; }
<t>.   { int i; /* my compiler complains not c99 if i use for( int i=0... */
         if( ntab > ltab )
           printf("> indent >\n");
         else if( ntab < ltab )
           for( i = 0; i < ltab - ntab; i++ )
             printf("< dedent <\n");
         else
           printf("=        =\n");

         ltab = ntab; ntab = 0;
         BEGIN(INITIAL);
         /* move to next rule */
         REJECT;}
.    /* ignore everything else for now */

%%

main()
{
  yyin = fopen( "test", "r" );
  yylex();
}

あなたはそれをいじってみることができます、多分あなたは私が欠けているものを見るでしょう. 複数の dedent を返すことは、Haxe では簡単です ( return t_dedent( num ); )。

このコードは、インデント/デデントと常に正しく一致するとは限りません。

更新 3: flex への希望をあきらめて、自分のやり方でやろうと思います。誰かが flex でそれを行う方法を知っていれば、とにかくそれを聞いてうれしいです.

score 15 · Accepted Answer

あなたがする必要があるのは、各行の先頭にある空白の量をフレックスカウントし、パーサーが物事をグループ化するために使用する適切な数の INDENT/UNINDENT トークンを挿入することです。1 つの質問は、タブとスペースについて何をしたいのかということです。それらを固定タブストップと同等にしたいだけですか、それともインデントに一貫性を持たせる必要がありますか (したがって、1 行がタブで始まり、次の行がタブで始まる場合)スペースを使用すると、エラーが通知されますが、これはおそらく少し難しいです)。

固定の 8 列のタブストップが必要であると仮定すると、次のようなものを使用できます。

%{
/* globals to track current indentation */
int current_line_indent = 0;   /* indentation of the current line */
int indent_level = 0;          /* indentation level passed to the parser */
%}

%x indent /* start state for parsing the indentation */
%s normal /* normal start state for everything else */

%%
<indent>" "      { current_line_indent++; }
<indent>"\t"     { current_line_indent = (current_line_indent + 8) & ~7; }
<indent>"\n"     { current_line_indent = 0; /*ignoring blank line */ }
<indent>.        {
                   unput(*yytext);
                   if (current_line_indent > indent_level) {
                       indent_level++;
                       return INDENT;
                   } else if (current_line_indent < indent_level) {
                       indent_level--;
                       return UNINDENT;
                   } else {
                       BEGIN normal;
                   }
                 }

<normal>"\n"     { current_line_indent = 0; BEGIN indent; }
... other flex rules ...

必ずインデントモードで解析を開始する必要があります (最初の行でインデントを取得するため)。

score 6 · Accepted Answer

クリスの答えは、使用可能なソリューションに向けて大いに役立ちます。これに感謝します! 残念ながら、私が必要としていたいくつかの重要な側面が欠けています。

一度に複数のアウトデント (インデント解除)。次のコードは、への呼び出しの後に2 つのbazアウトデントを発行する必要があると考えてください。
```
def foo():
  if bar:
    baz()
```
ファイルの終わりに達し、まだインデントレベルにある場合に、アウトデントを発生させます。
異なるサイズのインデントレベル。Chris の現在のコードは、1 スペースのインデントに対してのみ正しく機能します。

クリスのコードに基づいて、これまでに遭遇したすべてのケースで機能するソリューションを思いつきました。github で flex (および bison) を使用してインデントベースのテキストを解析するためのテンプレートプロジェクトを作成しました: https://github.com/lucasb-eyer/flex-bison-indentation。これは、現在のトークンの行位置と列範囲も追跡する、完全に機能する (CMake ベースの) プロジェクトです。

何らかの理由でリンクが壊れた場合に備えて、レクサーの要点は次のとおりです。

#include <stack>

int g_current_line_indent = 0;
std::stack<size_t> g_indent_levels;
int g_is_fake_outdent_symbol = 0;

static const unsigned int TAB_WIDTH = 2;

#define YY_USER_INIT { \
    g_indent_levels.push(0); \
    BEGIN(initial); \
}
#include "parser.hh"

%}

%x initial
%x indent
%s normal

%%
    int indent_caller = normal;

 /* Everything runs in the <normal> mode and enters the <indent> mode
    when a newline symbol is encountered.
    There is no newline symbol before the first line, so we need to go
    into the <indent> mode by hand there.
 */
<initial>.  { set_yycolumn(yycolumn-1); indent_caller = normal; yyless(0); BEGIN(indent); }
<initial>\n { indent_caller = normal; yyless(0); BEGIN(indent); }    

<indent>" "     { g_current_line_indent++; }
<indent>\t      { g_current_line_indent = (g_current_line_indent + TAB_WIDTH) & ~(TAB_WIDTH-1); }
<indent>\n      { g_current_line_indent = 0; /* ignoring blank line */ }
<indent><<EOF>> {
                    // When encountering the end of file, we want to emit an
                    // outdent for all indents currently left.
                    if(g_indent_levels.top() != 0) {
                        g_indent_levels.pop();

                        // See the same code below (<indent>.) for a rationale.
                        if(g_current_line_indent != g_indent_levels.top()) {
                            unput('\n');
                            for(size_t i = 0 ; i < g_indent_levels.top() ; ++i) {
                                unput(' ');
                            }
                        } else {
                            BEGIN(indent_caller);
                        }

                        return TOK_OUTDENT;
                    } else {
                        yyterminate();
                    }
                }

<indent>.       {
                    if(!g_is_fake_outdent_symbol) {
                        unput(*yytext);
                    }
                    g_is_fake_outdent_symbol = 0;
                    // -2: -1 for putting it back and -1 for ending at the last space.
                    set_yycolumn(yycolumn-1);

                    // Indentation level has increased. It can only ever
                    // increase by one level at a time. Remember how many
                    // spaces this level has and emit an indentation token.
                    if(g_current_line_indent > g_indent_levels.top()) {
                        g_indent_levels.push(g_current_line_indent);
                        BEGIN(indent_caller);
                        return TOK_INDENT;
                    } else if(g_current_line_indent < g_indent_levels.top()) {
                        // Outdenting is the most difficult, as we might need to
                        // outdent multiple times at once, but flex doesn't allow
                        // emitting multiple tokens at once! So we fake this by
                        // 'unput'ting fake lines which will give us the next
                        // outdent.
                        g_indent_levels.pop();

                        if(g_current_line_indent != g_indent_levels.top()) {
                            // Unput the rest of the current line, including the newline.
                            // We want to keep it untouched.
                            for(size_t i = 0 ; i < g_current_line_indent ; ++i) {
                                unput(' ');
                            }
                            unput('\n');
                            // Now, insert a fake character indented just so
                            // that we get a correct outdent the next time.
                            unput('.');
                            // Though we need to remember that it's a fake one
                            // so we can ignore the symbol.
                            g_is_fake_outdent_symbol = 1;
                            for(size_t i = 0 ; i < g_indent_levels.top() ; ++i) {
                                unput(' ');
                            }
                            unput('\n');
                        } else {
                            BEGIN(indent_caller);
                        }

                        return TOK_OUTDENT;
                    } else {
                        // No change in indentation, not much to do here...
                        BEGIN(indent_caller);
                    }
                }

<normal>\n    { g_current_line_indent = 0; indent_caller = YY_START; BEGIN(indent); }

score 0 · Accepted Answer

これに類似したルールが必要です（インデントにタブを使用すると仮定します）。

\ t：{TABDENTを返す; }

率直に言って、私は常に中括弧（または開始/終了）が人間としてもレクサー/パーサーライターとしても書き込みと読み取りが簡単であることに気づきました。

compiler-construction - bison と flex でインデントをブロック区切りとして使用する方法

4 に答える 4

Related

Reference