functional-programming - 関数型プログラミングを使用した文字のシーケンスであるすべてのサブストリング

Question

文字列内の同じ文字の実行を見つけることに関する以前の質問のフォローアップとして、文字または数字の昇順または降順のシーケンスである2より大きい長さのすべてのサブ文字列を見つける機能的なアルゴリズムも見つけたいと思います（例：、： "defgh"、 "34567"、 "XYZ"、 "fedcba"、 "NMLK"、9876 "など）文字列（[Char]）。私が検討しているシーケンスは、、、、およびそれらのサブストリングA..Zのみですa..z。0..9戻り値は（ゼロベースのオフセット、長さ）ペアのリストである必要があります。「zxcvbn」パスワード強度アルゴリズムをJavaScript（必須コードを含む）からScalaに変換しています。コードを純粋に保持したいと思います。可能な限り機能的、関数型プログラミングスタイルで書くために与えられたすべての通常の理由のため。

私のコードはScalaで書かれていますが、Clojure、F＃、Haskell、または疑似コードのいずれかでアルゴリズムを変換できる可能性があります。

例：文字列の場合は。qweABCD13987を返し[(3,4),(9,3)]ます。

仕事用のコンピューターに再びアクセスできるようになったときに投稿する、かなり怪しげな関数を作成しましたが、より洗練されたソリューションが存在すると確信しています。

もう一度、ありがとう。

score 0 · Accepted Answer

この問題の良い解決策は、最初に思ったよりも本当に複雑だと思います。私はScalaProではないので、私のソリューションは確かに最適で素晴らしいものではありませんが、おそらくそれはあなたにいくつかのアイデアを与えるでしょう。

基本的な考え方は、2つの連続する文字の差を計算することですが、その後、残念ながら少し面倒になります。コードの一部が不明確かどうか私に尋ねてください！

object Sequences {

  val s = "qweABCD13987"                         

  val pairs = (s zip s.tail) toList   // if s might be empty, add a check here            
  // = List((q,w), (w,e), (e,A), (A,B), (B,C), (C,D), (D,1), (1,3), (3,9), (9,8), (8,7))

  // assuming all characters are either letters or digits
  val diff = pairs map {case (t1, t2) =>
    if (t1.isLetter ^ t2.isLetter) 0 else t1 - t2}   // xor could also be replaced by !=
  // = List(-6, 18, 36, -1, -1, -1, 19, -2, -6, 1, 1)

  /**
   *
   * @param xs A list indicating the differences between consecutive characters
   * @param current triple: (start index of the current sequence;
   *                         number of current elements in the sequence;
   *                         number indicating the direction i.e. -1 = downwards, 1 = upwards, 0 = doesn't matter)
   * @return A list of triples similar to the argument
   */
  def sequences(xs: Seq[Int], current: (Int, Int, Int) = (0, 1, 0)): List[(Int, Int, Int)] = xs match {
    case Nil => current :: Nil
    case (1 :: ys) =>
      if (current._3 != -1)
        sequences(ys, (current._1, current._2 + 1, 1))
      else
        current :: sequences(ys, (current._1 + current._2 - 1, 2, 1))  // "recompute" the current index
    case (-1 :: ys) =>
      if (current._3 != 1)
        sequences(ys, (current._1, current._2 + 1, -1))
      else
        current :: sequences(ys, (current._1 + current._2 - 1, 2, -1))
    case (_ :: ys) =>
      current :: sequences(ys, (current._1 + current._2, 1, 0))
  }                                               

  sequences(diff) filter (_._2 > 1) map (t => (t._1, t._2))
}

score 0 · Accepted Answer

問題をいくつかの小さなサブ問題に分割するのが常に最善です。私はHaskellで解決策を書きましたが、それは私にとって簡単です。レイジーリストを使用しますが、ストリームを使用するか、メイン関数の末尾を再帰的にして中間結果を引数として渡すことで、Scalaに変換できると思います。

-- Mark all subsequences whose adjacent elements satisfy
-- the given predicate. Includes subsequences of length 1.
sequences :: (Eq a) => (a -> a -> Bool) -> [a] -> [(Int,Int)]
sequences p [] = []
sequences p (x:xs) = seq x xs 0 0
  where
    -- arguments: previous char, current tail sequence, 
    -- last asc. start offset of a valid subsequence, current offset
    seq _ [] lastOffs curOffs = [(lastOffs, curOffs - lastOffs)]
    seq x (x':xs) lastOffs curOffs
        | p x x'    -- predicate matches - we're extending current subsequence
            = seq x' xs lastOffs curOffs'
        | otherwise -- output the currently marked subsequence and start a new one
            = (lastOffs, curOffs - lastOffs) : seq x' xs curOffs curOffs'
      where
        curOffs' = curOffs + 1

-- Marks ascending subsequences.
asc :: (Enum a, Eq a) => [a] -> [(Int,Int)]
asc = sequences (\x y -> succ x == y)

-- Marks descending subsequences.
desc :: (Enum a, Eq a) => [a] -> [(Int,Int)]
desc = sequences (\x y -> pred x == y)

-- Returns True for subsequences of length at least 2.
validRange :: (Int, Int) -> Bool
validRange (offs, len) = len >= 2

-- Find all both ascending and descending subsequences of the
-- proper length.
combined :: (Enum a, Eq a) => [a] -> [(Int,Int)]
combined xs = filter validRange (asc xs) ++ filter validRange (desc xs)

-- test:
main = print $ combined "qweABCD13987"

score 0 · Accepted Answer

これがClojureでの私の概算です：

入力文字列を変換して、以前のアルゴリズムを適用して解決策を見つけることができます。alorithmは最もパフォーマンスが高いとは言えませんが、より抽象化された読みやすいコードが得られると思います。

サンプルの文字列は、次の方法で変換できます。

user => (find-serials "qweABCD13987")
(0 1 2 # # # # 7 8 # # #)

前の関数「find-runs」を再利用する：

user => (find-runs (find-serials "qweABCD13987"))
([3 4] [9 3])

最終的なコードは次のようになります。

(defn find-runs [s]
  (let [ls (map count (partition-by identity s))]
    (filter #(>= (% 1) 3) 
            (map vector (reductions + 0 ls) ls))))

(def pad "#")

(defn inc-or-dec? [a b] 
  (= (Math/abs (- (int a) (int b))) 1 ))

(defn serial? [a b c] 
  (or (inc-or-dec? a b) (inc-or-dec? b c)))

(defn find-serials [s] 
  (map-indexed (fn [x [a b c]] (if (serial? a b c) pad x)) 
       (partition 3 1 (concat pad s pad))))

find-serials3セルのスライディングウィンドウを作成しserial?、シーケンスの開始/中間/終了であるセルの検出に適用します。文字列は便利なパッドが付いているので、ウィンドウは常に元の文字の中央に配置されます。

functional-programming - 関数型プログラミングを使用した文字のシーケンスであるすべてのサブストリング

3 に答える 3

Related

Reference