parsing - HaskellParsec-カスタムトークンを使用している間はエラーメッセージはあまり役に立ちません

Question

私はパーサーの字句解析と構文解析の段階を分離することに取り組んでいます。いくつかのテストの後、ParsecのCharトークン以外のトークンを使用している場合、エラーメッセージはあまり役に立たないことに気付きました。

Charトークンの使用中のParsecのエラーメッセージの例を次に示します。

ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf  wrong"
parse error at (line 1, column 7):
unexpected "w"
expecting space or "ok"


ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong"
parse error at (line 1, column 1):
unexpected "w"
expecting "ok" or "nop"

したがって、文字列パーサーは予期しない文字列が見つかったときに予期される文字列を示し、選択パーサーは代替文字列を示します。

しかし、トークンで同じコンビネータを使用すると、次のようになります。

ghci> Parser.parseTest ((tok $ Ide "asdf") >> (tok $ Ide "ok")) "asdf  "
parse error at "test" (line 1, column 1):
unexpected end of input

この場合、期待どおりの結果は印刷されません。

ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) "asdf  "
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 1))

そして、私が使用するときchoice、それは代替を印刷しません。

この動作は、トークンではなくコンビネータ関数に関連していると思いますが、私は間違っているようです。どうすればこれを修正できますか？

完全なレクサー+パーサーコードは次のとおりです。

レクサー：

module Lexer
    ( Token(..)
    , TokenPos(..)
    , tokenize
    ) where

import Text.ParserCombinators.Parsec hiding (token, tokens)
import Control.Applicative ((<*), (*>), (<$>), (<*>))

data Token = Ide String
           | Number String
           | Bool String
           | LBrack
           | RBrack
           | LBrace
           | RBrace
           | Keyword String
    deriving (Show, Eq)

type TokenPos = (Token, SourcePos)

ide :: Parser TokenPos
ide = do
    pos <- getPosition
    fc  <- oneOf firstChar
    r   <- optionMaybe (many $ oneOf rest)
    spaces
    return $ flip (,) pos $ case r of
                 Nothing -> Ide [fc]
                 Just s  -> Ide $ [fc] ++ s
  where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++ "_"
        rest      = firstChar ++ ['0'..'9']

parsePos p = (,) <$> p <*> getPosition

lbrack = parsePos $ char '[' >> return LBrack
rbrack = parsePos $ char ']' >> return RBrack
lbrace = parsePos $ char '{' >> return LBrace
rbrace = parsePos $ char '}' >> return RBrace


token = choice
    [ ide
    , lbrack
    , rbrack
    , lbrace
    , rbrace
    ]

tokens = spaces *> many (token <* spaces)

tokenize :: SourceName -> String -> Either ParseError [TokenPos]
tokenize = runParser tokens ()

パーサー：

module Parser where

import Text.Parsec as P
import Control.Monad.Identity
import Lexer

parseTest  :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
    case tokenize "test" s of
        Left e -> putStrLn $ show e
        Right ts' -> P.parseTest p ts'

tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = token show snd test
  where test (t', _) = case t == t' of
                           False -> Nothing
                           True  -> Just t

解決：

さて、fp4meの答えとParsecのCharソースをより注意深く読んだ後、私はこれに行き着きました：

{-# LANGUAGE FlexibleContexts #-}
module Parser where

import Text.Parsec as P
import Control.Monad.Identity
import Lexer

parseTest  :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
    case tokenize "test" s of
        Left e    -> putStrLn $ show e
        Right ts' -> P.parseTest p ts'


type Parser a = Parsec [TokenPos] () a

advance :: SourcePos -> t -> [TokenPos] -> SourcePos
advance _ _ ((_, pos) : _) = pos
advance pos _ [] = pos

satisfy :: (TokenPos -> Bool) -> Parser Token
satisfy f = tokenPrim show
                      advance
                      (\c -> if f c then Just (fst c) else Nothing)

tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = (Parser.satisfy $ (== t) . fst) <?> show t

今、私は同じエラーメッセージを受け取っています：

ghci> Parser.parseTest（choice [tok $ Ide "ok"、tok $ Ide "nop"]） "asdf"
parse error at（line 1、column 1）：
unexpected（Ide "asdf"、 "test"（line 1 、列3））
Ide"ok"またはIde"nop"を期待する

score 5 · Accepted Answer

解決策の最初は、パーサーで選択関数を定義し、特定の予期しない関数を使用して予期しないエラーをオーバーライドし、最後に<?>演算子を使用して予期するメッセージをオーバーライドすることです。

mychoice [] = mzero
mychoice (x:[]) = (tok x <|> myUnexpected) <?> show x 
mychoice (x:xs) = ((tok x <|> mychoice xs) <|> myUnexpected)  <?> show (x:xs)

myUnexpected =  do 
             input <- getInput 
             unexpected $ (id $ first input )
           where 
            first [] = "eof"
            first (x:xs) = show $ fst x

そして、そのようにパーサーを呼び出します：

ghci> Parser.parseTest (mychoice [Ide "ok", Ide "nop"]) "asdf  "
parse error at (line 1, column 1):
unexpected Ide "asdf"
expecting [Ide "ok",Ide "nop"]

parsing - HaskellParsec-カスタムトークンを使用している間はエラーメッセージはあまり役に立ちません

1 に答える 1

Related

Reference