parsing - Haskell: 文字列/テキストファイルをトラバースする

Question

スクリプトファイルを読み込んで処理し、html ファイルに出力しようとしています。私のスクリプトファイルでは、@title(this is a title) があるときはいつでも、html 出力にタグ[header] this is a title [/header]を追加します。したがって、私のアプローチは、最初にスクリプトファイルを読み取り、コンテンツを文字列に書き込み、文字列を処理してから、文字列を html ファイルに書き込むことです。

@title を認識するためには、文字列を 1 文字ずつ読み取る必要があります。「@」を読んだら、次の文字を検出して、それらがタイトル e かどうかを確認する必要があります。

質問: Haskell で文字列 (char のリスト) をトラバースするにはどうすればよいですか?

score 4 · Accepted Answer

たとえば、単純な再帰トリックを使用できます

findTag [] = -- end of list code.
findTag ('@':xs)
  | take 5 xs == "title" = -- your code for @title
  | otherwise            = findTag xs
findTag (_:xs) = findTag xs

したがって、基本的には、次の文字 (リストの先頭) が「@」の場合にパターンマッチを行い、次の 5 文字が「タイトル」を形成するかどうかを確認します。その場合は、解析コードを続行できます。次の文字が「@」でない場合は、再帰を続行します。リストが空になると、最初のパターンマッチに到達します。

他の誰かがより良い解決策を持っているかもしれません。

これがあなたの質問に答えることを願っています。

編集：

もう少し柔軟性を持たせるために、特定のタグを見つけたい場合は、次のようにすることができます。

findTag [] _ = -- end of list code.
findTag ('@':xs) tagName
  | take (length tagName) xs == tagName = -- your code for @title
  | otherwise = findTag xs
findTag (_:xs) _ = findTag xs

こうすれば

findTag text "title"

特にタイトルを探します。タグ名はいつでも好きな名前に変更できます。

別の編集：

findTag [] _ = -- end of list code.
findTag ('@':xs) tagName
  | take tLength xs == tagName = getTagContents tLength xs
  | otherwise = findTag xs
  where tLength = length tagName
findTag (_:xs) _ = findTag xs

getTagContents :: Int -> String -> String
getTagContents len = takeWhile (/=')') . drop (len + 1)

正直に言うと、少し面倒ですが、ここで何が起こっているかを示します。

最初に tagName の長さを削除し、次に開始ブラケットの長さを 1 つ削除します。次に、takeWhile を使用して終了ブラケットまでの文字を取得して終了します。

score 3 · Accepted Answer

明らかに、あなたの問題は構文解析のカテゴリーに分類されます。Daniel Wagnerが賢明に述べているように、保守性の理由から、一般的にパーサーを使用してアプローチする方がはるかに優れています。

もう1つは、テキストデータを効率的に処理する場合は、Textの代わりにを使用することをお勧めしますString。

Attoparsecパーサーライブラリを使用して問題を解決する方法は次のとおりです。

-- For autocasting of hardcoded strings to `Text` type
{-# LANGUAGE OverloadedStrings #-}

-- Import a way more convenient prelude, excluding symbols conflicting 
-- with the parser library. See
-- http://hackage.haskell.org/package/classy-prelude
import ClassyPrelude hiding (takeWhile, try)
-- Exclude the standard Prelude
import Prelude ()
import Data.Attoparsec.Text

-- A parser and an inplace converter for title
title = do
  string "@title("
  r <- takeWhile $ notInClass ")"
  string ")"
  return $ "[header]" ++ r ++ "[/header]"

-- A parser which parses the whole document to parts which are either
-- single-character `Text`s or modified titles
parts = 
  (try endOfInput >> return []) ++
    ((:) <$> (try title ++ (singleton <$> anyChar)) <*> parts)

-- The topmost parser which concats all parts into a single text
top = concat <$> parts

-- A sample input
input = "aldsfj@title(this is a title)sdlfkj@title(this is a title2)"

-- Run the parser and output result
main = print $ parseOnly top input

この出力

Right "aldsfj[header]this is a title[/header]sdlfkj[header]this is a title2[/header]"

PS ClassyPreludeは、のエイリアスとして再実装さ++れるため、必要に応じて、またはに置き換えることができます。Monoidmappendmappend<>Alternative<|>

parsing - Haskell: 文字列/テキスト ファイルをトラバースする

3 に答える 3

Related

Reference

parsing - Haskell: 文字列/テキストファイルをトラバースする