parsing - Haskell パーサーから AST データ型への代入

Question

私は質問への回答を得ようと数日間インターウェブを検索してきましたが、ついに敗北を認めました.
私は文法を与えられました：

Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Int ::= Dig | Dig Int Var ::= a | b | ... z | A | B | C | ... | Z Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr

そして、演算子* + -が通常の意味を持つこの文法を使用して式を解析、評価、出力するように言われまし
た具体的なタスクは関数を書くことですparse :: String -> AST

これは文字列を入力として受け取り、入力が正しい形式である場合に抽象構文ツリーを返します (これは正しいと想定できます)。

適切なデータ型が必要であり、そのデータ型は他のクラスから派生する必要があるかもしれないと言われました。

出力例に従う
data AST = Leaf Int | Sum AST AST | Min AST | ...

tokens::String -> [String]
さらに、入力文字列をトークンのリストに分割する関数を作成することを検討する必要があります。
解析は
ast::[String] -> (AST,[String])
、入力がトークンのリストであり、AST を出力する場所で実行する必要があります。部分式を解析するには、単純に ast を使用する必要があります。再帰的に機能します。

また、結果を出力する printExpr メソッドを作成して、式を評価する関数またはまたは関数も
printE: AST -> String
printE(parse "* 5 5")生成する必要があります。"5*5""(5*5)"

evali :: AST -> Int

どこから始めればよいか、正しい方向に向けてもらいたいだけです。私は一般的に Haskell と FP についてほとんど知識がなく、このタスクを解決しようとして、Java から文字列処理関数を作成しました。
ですから、正しい方向への小さな指針と、AST が「どのように」見えるべきかについての説明かもしれませ
ん。前もって感謝します！
編集

私は不明確だったかもしれません: 入力文字列を読み取ってトークン化してから AST を作成するまで、どうすればよいのか疑問に思っています。

score 26 · Accepted Answer

Parsing tokens into an Abstract Syntax Tree

OK, let's take your grammar

Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr

This is a nice easy grammar, because you can tell from the first token what sort of epression it will be. (If there was something more complicated, like + coming between numbers, or - being used for subtraction as well as negation, you'd need the list-of-successes trick, explained in Functional Parsers.)

Let's have some sample raw input:

rawinput = "- 6 + 45 let x = - 5 in * x x"

Which I understand from the grammar represents "(- 6 (+ 45 (let x=-5 in (* x x))))", and I'll assume you tokenised it as

tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]

which fits the grammar, but you might well have got

tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]

which fits your sample AST better. I think it's good practice to name your AST after bits of your grammar, so I'm going to go ahead and replace

data AST = Leaf Int | Sum AST AST | Min AST | ...

with

data Expr = E_Int Int | E_Neg Expr | E_Sum Expr Expr | E_Prod Expr Expr | E_Var Char 
                      | E_Let {letvar::Char,letequal:: Expr,letin::Expr}
 deriving Show

I've named the bits of an E_Let to make it clearer what they represent.

Writing a parsing function

You could use isDigit by adding import Data.Char (isDigit) to help out:

expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
             | s == "-" = let (e,ss') = expr ss in (E_Neg e,ss') 
             | s == "+" = (E_Sum e e',ss'') where
                          (e,ss') = expr ss
                          (e',ss'') = expr ss'
            -- more cases

Yikes! Too many let clauses obscuring the meaning, and we'll be writing the same code for E_Prod and very much worse for E_Let. Let's get this sorted out!

The standard way of dealing with this is to write some combinators; instead of tiresomely threading the input [String]s through our definition, define ways to mess with the output of parsers (map) and combine multiple parsers into one (lift).

Clean up the code 1: map

First we should define pmap, our own equivalent of the map function so we can do pmap E_Neg (expr1 ss) instead of let (e,ss') = expr1 ss in (E_Neg e,ss')

pmap :: (a -> b) -> ([String] -> (a,[String])) -> ([String] -> (b,[String]))

nonono, I can't even read that! We need a type synonym:

type Parser a = [String] -> (a,[String])

pmap :: (a -> b) -> Parser a -> Parser b
pmap f p = \ss -> let (a,ss') = p ss 
                  in (f a,ss')

But really this would be better if I did

data Parser a = Par [String] -> (a,[String])

so I could do

instance Functor Parser where
  fmap f (Par p) = Par (pmap f p)

I'll leave that for you to figure out if you fancy.

Clean up the code 2: combining two parsers

We also need to deal with the situation when we have two parsers to run, and we want to combine their results using a function. This is called lifting the function to parsers.

liftP2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftP2 f p1 p2 = \ss0 -> let
              (a,ss1) = p1 ss0
              (b,ss2) = p2 ss1
              in (f a b,ss2)

or maybe even three parsers:

liftP3 :: (a -> b -> c -> d) -> Parser a -> Parser b -> Parser c -> Parser d

I'll let you think how to do that. In the let statement you'll need liftP5 to parse the sections of a let statement, lifting a function that ignores the "=" and "in". You could make

equals_ :: Parser ()
equals_ [] = error "equals_: expected = but got end of input"
equals_ ("=":ss) = ((),ss)
equals_ (s:ss) = error $ "equals_: expected = but got "++s

and a couple more to help out with this.

Actually, pmap could also be called liftP1, but map is the traditional name for that sort of thing.

Rewritten with the nice combinators

Now we're ready to clean up expr:

expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
            | s == "-" = pmap   E_Neg expr ss
            | s == "+" = liftP2 E_Sum expr expr ss
            -- more cases

That'd all work fine. Really, it's OK. But liftP5 is going to be a bit long, and feels messy.

Taking the cleanup further - the ultra-nice Applicative way

Applicative Functors is the way to go. Remember I suggested refactoring as

data Parser a = Par [String] -> (a,[String])

so you could make it an instance of Functor? Perhaps you don't want to, because all you've gained is a new name fmap for the perfectly working pmap and you have to deal with all those Par constructors cluttering up your code. Perhaps this will make you reconsider, though; we can import Control.Applicative, then using the data declaration, we can define <*>, which sort-of means then and use <$> instead of pmap, with *> meaning <*>-but-forget-the-result-of-the-left-hand-side so you would write

expr (s:ss) | s == "let" = E_Let <$> var *> equals_ <*> expr <*> in_ *> expr

Which looks a lot like your grammar definition, so it's easy to write code that works first time. This is how I like to write Parsers. In fact, it's how I like to write an awful lot of things. You'd only have to define fmap, <*> and pure, all simple ones, and no long repetative liftP3, liftP4 etc.

Read up about Applicative Functors. They're great.

Hints for making Parser applicative: pure doesn't change the list. <*> is like liftP2, but the function doesn't come from outside, it comes as the output from p1.

score 4 · Accepted Answer

Haskell 自体から始めるには、Learn You a Haskell for Great Good! をお勧めします。、非常によく書かれた面白いガイドです。Real World Haskellは、もう 1 つの推奨される出発点です。

編集: 解析のより基本的な紹介はFunctional Parsersです。PHilip Wadler による成功のリストで失敗を置き換える方法が欲しかった。残念ながらオンラインでは販売されていないようです。

Haskell で構文解析を始めるには、まずHaskell でのモナド構文解析を読んでから、おそらくこの wikibook の例を読んでから、 parsec ガイドも読むべきだと思います。

あなたの文法

Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr

いくつかの抽象データ型を提案します:

data Dig = Dig_0 | Dig_1 | Dig_2 | Dig_3 | Dig_4 | Dig_5 | Dig_6 | Dig_7 | Dig_8 | Dig_9
data Integ = I_Dig Dig | I_DigInt Dig Integ
data Var = Var_a | Var_b | ... Var_z | Var_A | Var_B | Var_C | ... | Var_Z
data Expr = Expr_I Integ 
          | Expr_Neg Expr
          | Expr_Plus Expr Expr
          | Expr_Times Expr Expr Var
          | Expr_Var Var
          | Expr_let Var Expr Expr

これは本質的に再帰的に定義された構文ツリーであり、別の構文ツリーを作成する必要はありません。不格好で申し訳ありませんDig_がInteg_、大文字で始める必要があります。

Integ(個人的には、 s をInts にすぐに変換したいのでnewtype Integ = Integ Int、を実行し、おそらく実行したはずですnewtype Var = Var Charが、それはあなたには合わないかもしれません。)

基本的なもの - digand var、 and neg_、plus_、in_etc. を作成したら、Applicative インターフェイスを使用してそれらを構築します。たとえば、パーサーexprforExprは次のようになります。

expr =        Expr_I <$> integ 
          <|> Expr_Neg <$> neg_ *> expr
          <|> Expr_Plus <$> plus_ *> expr <*> expr
          <|> Expr_Times <$> times_ *> expr <*> expr
          <|> Expr_Var <$> var
          <|> Expr_let <$> let_ *> var <*> equals_ *> expr <*> in_ *> expr

したがって、ほとんどの場合、Haskell コードはクリーンで、与えられた文法によく似ています。

score 1 · Accepted Answer

OK、あなたはたくさんのものを構築しようとしているようですが、それがどこに向かっているのか正確にはわかりません. AST適切な定義を取得してから、実装を試みるevaliことが良いスタートになることをお勧めします。

あなたがリストした文法は興味深いです...入力したいようです* 5 5が、出力5*5は奇妙な選択です。それは本当にバイナリではなく、単項のマイナスであるはずですか? 同様に、おそらくあなたはタイプするつもりだった* Expr Expr Varように見えます* Expr Expr | Var...

とにかく、あなたが言おうとしていることについていくつかの仮定を立てると、AST は次のようになります。

data AST = Leaf Int | Sum AST AST | Minus AST | Var String | Let String AST AST

では、やってみましょうprintE。AST を取り、文字列を返します。上記の定義により、AST は 5 つの可能なもののうちの 1 つでなければなりません。それぞれに何を印刷するかを決めるだけです。

 printE :: AST -> String
 printE (Leaf  x  ) = show x
 printE (Sum   x y) = printE x ++ " + " ++ printE y
 printE (Minus x  ) = "-" ++ printE x
 ...

showをにInt変換しStringます。++2 つの文字列を結合します。残りの機能はあなたにお任せします。(トリッキーなことは、部分式の順序を正しく表示するために括弧を出力したい場合です...あなたの文法は括弧について言及していないので、私はそうではないと思います。)

ではどうevaliですか？さて、これは同様の取引になるでしょう。AST がの場合、はであり、Leaf xそれを返すだけです。たとえば、がある場合、は整数ではなく、AST であるため、を使用して整数に変換する必要があります。関数は次のようになりますxIntMinus xxevali

evali :: AST -> Int
evali (Leaf  x  ) = x
evali (Sum   x y) = (evali x) + (evali y)
evali (Minus x  ) = 0 - (evali x)
...

それはこれまでのところうまくいきます。ちょっと待って！Letを使用して新しい変数を定義し、後でそれらを参照できるようになっているようですVar。その場合、それらの変数をどこかに保存する必要があります。そして、それは機能をより複雑にするでしょう。

私のお勧めはData.Map、変数名とそれに対応する値のリストを保存するために使用することです。型シグネチャに変数マップを追加する必要があります。次の方法で実行できます。

evali :: AST -> Int
evali ast = evaluate Data.Map.empty ast

evaluate :: Map String Int -> AST -> Int
evaluate m ast =
  case ast of
    ...same as before...
    Let var ast1 ast2 -> evaluate (Data.Map.insert var (evaluate m ast1)) ast2
    Var var           -> m ! var

したがって、空の変数マップを使用してevali呼び出すだけです。evaluateがを参照すると、変数がevaluateマップLetに追加されます。そして、Varが表示されると、マップで名前を検索します。

そもそも文字列をASTに解析することに関しては、それはまったく別の答えです...