parsing - Attoparsec で sepBy 文字列を使用する

Question

","文字列を、、のいずれかで区切ってから、その間にあったものを返そうとして", and"います。"and"私がこれまでに持っているものの例は次のとおりです。

import Data.Attoparsec.Text

sepTestParser = nameSep ((takeWhile1 $ inClass "-'a-zA-Z") <* space)
nameSep p = p `sepBy` (string " and " <|> string ", and" <|> ", ")

main = do
  print $ parseOnly sepTestParser "This test and that test, this test particularly."

出力をにしたいと思います["This test", "that test", "this test particularly."]。自分のやっていることが間違っているという漠然とした感覚がありますが、その理由がよくわかりません。

score 4 · Accepted Answer

^{注: この回答はリテラシーな Haskellで書かれています。名前を付けて保存しExample.lhs、GHCi などにロードします。}

問題は、次のようsepByに実装されていることです。

sepBy p s = liftA2 (:) p ((s *> sepBy1 p s) <|> pure []) <|> pure []

これは、最初のパーサーが成功した後sに 2 番目のパーサーが呼び出されることを意味します。これは、文字のクラスに空白を追加すると、最終的には次のようになることも意味します。

["This test and that test","this test particularly"]

sinceandはで解析できるようになりましたp。これを修正するのは簡単ではありません: スペースにヒットしたらすぐに先を見て、任意の数のスペースの後に「and」が続くかどうかを確認し、そうであれば解析を停止する必要があります。そうして初めて、で書かれたパーサーが機能しsepByます。

それでは、代わりに単語を取るパーサーを書きましょう (この回答の残りの部分は読みやすい Haskell です)。

> {-# LANGUAGE OverloadedStrings #-}
> import Control.Applicative
> import Data.Attoparsec.Text
> import qualified Data.Text as T
> import Control.Monad (mzero)

> word = takeWhile1 . inClass $ "-'a-zA-Z"
> 
> wordsP = fmap (T.intercalate " ") $ k `sepBy` many space
>   where k = do
>           a <- word
>           if (a == "and") then mzero
>                           else return a

wordsP何かにヒットするか、それが単語ではないか、または「and」に等しい単語になるまで、複数の単語を使用するようになりました。返されたは、別のパーサーが引き継ぐことができる解析失敗mzeroを示します。

> andP = many space *> "and" *> many1 space *> pure()
> 
> limiter = choice [
>     "," *> andP,
>     "," *> many1 space *> pure (),
>     andP
>   ]

limiterほとんどはあなたがすでに書いたパーサーと同じです。それは regex と同じ/,\s+and|,\s+|\s*and\s+/です。

sepBy最初のパーサーが 2 番目のパーサーとオーバーラップしないため、実際にを使用できるようになりました。

> test = "This test and that test, this test particular, and even that test"
>
> main = print $ parseOnly (wordsP `sepBy` limiter) test

結果は["This test","that test","this test particular","even that test"]、希望どおりです。この特定のパーサーは空白を保持しないことに注意してください。

したがって、でパーサーを作成する場合は常にsepBy、両方のパーサーが重複しないようにしてください。

parsing - Attoparsec で sepBy 文字列を使用する

1 に答える 1

Related

Reference