haskell - attoparsec の条件付き先読み

Question

内部にコメントを含むテキストを表すデータ構造があるとします。

data TWC
  = T Text TWC -- text
  | C Text TWC -- comment
  | E -- end
  deriving Show

したがって、文字列のように

"Text, {-comment-}, and something else"

としてエンコードできます

T "Text, " (C "comment" (T ", and something else" E))

コメントチャンクと for のパーサーEは非常に簡単です。

twcP :: Parser TWC
twcP = eP <|> cP <|> tP

cP :: Parser TWC
cP = do
  _ <- string "{-"
  c <- manyTill anyChar (string "-}")
  rest <- cP <|> tP <|> eP
  return (C (pack c) rest)

eP :: Parser TWC
eP = do
  endOfInput
  return E

このような簡単な方法でテキストチャンクのパーサーを実装する

tP :: Parser TWC
tP = do
  t <- many1 anyChar
  rest <- cP <|> eP
  return (T (pack t) rest)

貪欲な性質のため、コメントセクションをテキストとして消費させます

> parseOnly twcP "text{-comment-}"
Right (T "text{-comment-}" E)
it ∷ Either String TWC

では、問題は、入力の最後まで、またはコメントセクションまでの解析のロジックをどのように表現するかということです。つまり、条件付き先読みパーサーを実装する方法は?

score 5 · Accepted Answer

おっしゃるとおり、問題のあるコードはの最初の行で、tPコメントで止まることなく貪欲にテキストを解析します。

tP = do
  t <- many1 anyChar

これに対処する前に、最初にコードを少しリファクタリングしてヘルパーを導入し、問題のあるコードをtextヘルパーに分離して適用可能なスタイルを使用したいと思います。

-- Like manyTill, but pack the result to Text.
textTill :: Alternative f => f Char -> f b -> f Text
textTill p end = pack <$> manyTill p end

-- Parse one comment string
comment :: Parser Text
comment = string "{-" *> textTill anyChar (string "-}")

-- Parse one non-comment text string (problematic implementation)
text :: Parser Text
text = pack <$> many1 anyChar

-- TWC parsers:

twcP :: Parser TWC
twcP = eP <|> cP <|> tP

cP :: Parser TWC
cP = C <$> comment <*> twcP

eP :: Parser TWC
eP = E <$ endOfInput

tP :: Parser TWC
tP = T <$> text <*> twcP

先読みを実装するにはlookAhead、入力を消費せずにパーサーを適用するコンビネーターを使用できます。これにより、（消費せずに）またはtextに到達するまで解析を行うことができます。commentendOfInput

-- Parse one non-comment text string (working implementation)
text :: Parser Text
text = textTill anyChar (void (lookAhead comment) <|> endOfInput)

その実装でtwcPは、期待どおりに動作します。

ghci> parseOnly twcP "text{-comment-} post"
Right (T "text" (C "comment" (T " post" E)))

haskell - attoparsec の条件付き先読み

1 に答える 1

Related

Reference