haskell - Haskell の選択的テキスト難読化

Question

レポートタイトル、列ヘッダーなどの特定のキーワードを隠すことなく、テキストファイルレポートを難読化したいと考えています。newLisp を使用してそのようなプログラムを作成しました。Haskell の機能をゼロから実装しようとしています。これまでに入手したコードは、単純な難読化の場合に正常にコンパイルおよび実行されます。

module Main where

import Data.Char (isAlpha, isNumber, isUpper, toUpper)
import System.Environment (getArgs)
import System.Random (getStdGen, randomR, StdGen)

helpMessage = [ "Usage: cat filename(s) | obfuscate [-x filename] > filename",
  "",
  "Obfuscates text files. This obliterates the text--there is no recovery. This",
  "is not encryption. It's simple, if slow, obfuscation.",
  "",
  "To include a list of words not to obfuscate, use the -x option. List one word",
  "per line in the file.",
  "" ]

data CLOpts = CLOpts { help           :: Bool
                     , exceptionFileP :: Bool
                     , exceptionFile  :: String }

main = do
  args <- getArgs
  if length args > 0
  then do let opts = parseCL args CLOpts { help=False, exceptionFileP=False, exceptionFile="" }
          if help opts
          then do putStrLn $ unlines helpMessage
          else do if exceptionFileP opts
                  then do exceptions <- readFile $ exceptionFile opts
                          obf complexObfuscation $ lines exceptions
                  else do obf simpleObfuscation []
  else do obf simpleObfuscation []
  where obf f xs = do
          g <- getStdGen
          c <- getContents
          putStrLn $ f xs g c

parseCL :: [String] -> CLOpts -> CLOpts
parseCL []          opts = opts
parseCL ("-x":f:xs) opts = parseCL xs opts { exceptionFileP=True, exceptionFile=f }
parseCL      (_:xs) opts = parseCL xs opts { help=True }

simpleObfuscation xs = obfuscate

complexObfuscation exceptions g c = undefined

obfuscate :: StdGen -> String -> String
obfuscate g = obfuscate' g []
  where
    obfuscate' _ a [] = reverse a
    obfuscate' g a text@(c:cs)
      | isAlpha  c = obf obfuscateAlpha g a text
      | isNumber c = obf obfuscateDigit g a text
      | otherwise  = obf id             g a text
    obf f g a (c:cs) = let (x,g') = f (c,g) in obfuscate' g' (x:a) cs

obfuscateAlpha, obfuscateDigit :: (Char, StdGen) -> (Char, StdGen)
obfuscateAlpha (c,g) = obfuscateChar g range
  where range
          | isUpper c = ('A','Z')
          | otherwise = ('a','z')

obfuscateDigit (c,g) = obfuscateChar g ('0','9')

obfuscateChar :: StdGen -> (Char, Char) -> (Char, StdGen)
obfuscateChar = flip randomR

例外として渡された単語を除くすべてのテキストを難読化する方法がわかりません。私の newLisp の実装は、組み込みの正規表現処理に依存していました。Haskell で正規表現を使うのはあまりうまくいきませんでした。おそらく古いライブラリか何か。

テキストを行と単語に分割してJ、フレットと呼ばれるものを作成してみました。そのアプローチは急速に扱いにくくなっています。パーサーを使用しようとしましたが、それもかなり毛むくじゃらになると思います。

テキスト内の例外単語を特定するためのシンプルで簡単なアプローチと、それらを難読化機能に送信しない方法について、誰か提案がありますか? Haskell は非常に優れた言語であり、確かに私の鼻の下に何かが欠けています。

Google を試してみましたが、難読化しない単語の例外リストを提供したいという私の願望は斬新なようです。それ以外の場合、難読化は非常に簡単です。

アップデート

答えとしてマークしたアイデアに従って、独自のwords関数を作成しました。

words' :: String -> [String]
words' text = f text [] []
  where f [] wa ta = reverse $ wa:ta
        f (c:cs) wa ta =
          if isAlphaNum c
          then f cs (c:wa) ta
          else f cs [] $ if length wa > 0 then [c]:(reverse wa):ta else [c]:ta

使用してもうまくいきbreakませんでした。break と span を使用した相互再帰は機能すると思いますが、それを試す前に上記のコードを使用しました。

次に、complexObfuscation を次のように実装しました。

complexObfuscation exceptions g = unlines . map obfuscateLine . lines
  where obfuscateLine = concatMap obfuscateWord . words'
        obfuscateWord word =
          if word `elem` exceptions
          then word
          else obfuscate g word

これは私が求めていたものを達成しました。残念ながら、難読化を呼び出すたびに同じジェネレーターが同じ文字を生成するとは予想していませんでした。したがって、各単語は同じ文字で始まります。笑。別の日の問題。

score 1 · Accepted Answer

例外ファイルを読み取り、Data.Set.Set.

入力ファイルをlinesに分割した後、さらにに分割しwordsます。

次に、各単語を個別に難読化します。単語が以前に作成したelemエンティティの場合は、そのSetままにしておきます。obfuscateそれ以外の場合は、関数を各文字に適用します。

haskell - Haskell の選択的テキスト難読化

1 に答える 1

Related

Reference