3

RFC5322 電子メール アドレスを解析しようとしています。私のパーサーは、結果のうちの 1 つが正しいという意味で機能します。ただし、「正しい」結果を選択するにはどうすればよいでしょうか。

文字列Foo Bar <foo@bar.com>を指定すると、パーサーは の値を生成するはずですAddress (Just "Foo Bar") "foo@bar.com"

または、文字列 が与えられたfoo@bar.com場合、パーサーは の値を生成する必要がありますAddress Nothing "foo@bar.com"

名前が含まれている値が優先されます。

私のパーサーは次のようになります。

import           Control.Applicative
import           Data.Char
import qualified Data.Text                     as T
import           Text.ParserCombinators.ReadP

onlyEmail :: ReadP Address
onlyEmail = do
  skipSpaces
  email <- many1 $ satisfy isAscii
  skipSpaces
  return $ Address Nothing (T.pack email)

withName :: ReadP Address
withName = do
  skipSpaces
  name <- many1 (satisfy isAscii)
  skipSpaces
  email <- between (char '<') (char '>') (many1 $ satisfy isAscii)
  skipSpaces
  return $ Address (Just $ T.pack name) (T.pack email)

rfc5322 :: ReadP Address
rfc5322 = withName <|> onlyEmail

でパーサーを実行するとreadP_to_S rfc5322 "Foo Bar <foo@bar.com>"、次の結果が生成されます。

[ (Address {addressName = Nothing, addressEmail = "F"},"oo Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Fo"},"o Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo"},"Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo "},"Bar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo B"},"ar <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Ba"},"r <foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar"},"<foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar "},"<foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <"},"foo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <f"},"oo@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <fo"},"o@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo"},"@bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@"},"bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@b"},"ar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@ba"},"r.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar"},".com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar."},"com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.c"},"om>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.co"},"m>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.com"},">")
, (Address {addressName = Just "Foo Bar", addressEmail = "foo@bar.com"},"")
, (Address {addressName = Just "Foo Bar ", addressEmail = "foo@bar.com"},"")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo@bar.com>"},"")
]

この場合、実際に必要な結果はリストの最後から 3 番目に表示されます。その好みをどのように表現しますか?

4

1 に答える 1

4

優遇してはいけません。あなたの問題は、部分パーサーが実際に必要とするよりも大きな文字列セットを受け入れていることです。

たとえば、私の解決策:

import           Control.Bool
import           Control.Applicative
import           Data.Char
import qualified Data.Text                     as T
import           Data.Text (Text)
import           Text.ParserCombinators.ReadP

email :: ReadP Text
email = do
    l <- part
    a <- char '@'
    d <- part
    return . T.pack $ l ++ a:d
  where
    part = munch1 (isAscii <&&> (/='@') <&&> (/='<') <&&> (/='>'))

name :: ReadP Text
name = T.pack <$> chainr1 part sep
  where
    part = munch1 (isAlpha <||> isDigit <||> (=='\''))
    sep  = (\xs ys -> xs ++ ' ':ys) <$ munch1 (==' ')

onlyEmail :: ReadP Address
onlyEmail = Address Nothing <$> email

withName :: ReadP Address
withName = do
    n <- name
    skipSpaces
    e <- between (char '<') (char '>') email
    return $ Address (Just n) e

address :: ReadP Address
address = skipSpaces *> (withName <|> onlyEmail)

main = print $ readP_to_S address "Foo Bar <foo@bar.com>"

印刷されます:

[(Address (Just "Foo Bar") "foo@bar.com","")]
于 2016-12-28T12:19:29.887 に答える