string - 文字列で使用されるアルファベットのリストを取得するための良いまたは効率的な方法は何ですか

Question

簡単に言えば、Common Lisp で文字列から繰り返されない文字のリストを取得する方法は?

例えば：

"common"
-> ("c" "o" "m" "n") or in characters, (#\c #\o #\m #\n)
I'd care less about the order and type, if it is in string or character.

"overflow" -> (o v e r f l w)
"tomtomtom" -> (t o m)
   etc...

私が考えていたのは、元の文字列の最初の文字を収集することです。次に、関数を使用します。

(remove letter string)

今、削除された文字列の最初の文字を収集し、以前から既に収集された文字に追加します。再帰のように聞こえますが、再帰的に呼び出すと、以前に収集された *letter*s リストが失われますよね? また、これに組み込み関数があるかどうかも疑問です。

さらに、これを完全に機能的なスタイルで行いたいので、 setやそれらのいずれも使用したくありません。

御時間ありがとうございます。

score 8 · Accepted Answer

CL-USER> (remove-duplicates (coerce "common" 'list))
(#\c #\m #\o #\n)

または、次のように単純に行うこともできます。

CL-USER> (remove-duplicates "common")
"comn"

score 0 · Accepted Answer

There may be certain better possibilities to do that, if you can make some assumptions about the text you are dealing with. For instance, if you are dealing with English text only, then you could implement a very simple hash function (basically, use a bit vector 128 elements long), so that you wouldn't need to even use a hash-table (which is a more complex structure). The code below illustrates the idea.

(defun string-alphabet (input)
  (loop with cache =
       (coerce (make-array 128
                           :element-type 'bit
                           :initial-element 0) 'bit-vector)
     with result = (list input)
     with head = result
     for char across input
     for code = (char-code char) do
       (when (= (aref cache code) 0)
         (setf (aref cache code) 1
               (cdr head) (list char)
               head (cdr head)))
     finally (return (cdr result))))

(string-alphabet "overflow")
;; (#\o #\v #\e #\r #\f #\l #\w)

Coercing to bit-vector isn't really important, but it is easier for debugging (the printed form is more compact) and some implementation may actually optimize it to contain only so many integers that the platform needs to represent so many bits, i.e. in the case of 128 bits length, on a 64 bit platform, it could be as short as 2 or 3 integers long.

Or, you could've also done it like this, using integers:

(defun string-alphabet (input)
  (loop with cache = (ash 1 128)
     with result = (list input)
     with head = result
     for char across input
     for code = (char-code char) do
       (unless (logbitp code cache)
         (setf cache (logior cache (ash 1 code))
               (cdr head) (list char)
               head (cdr head)))
     finally (return (cdr result))))

but in this case you would be, in your worst case, create 128 big integers, which is not so expensive after all, but the bit-vector might do better. However, this might give you a hint, for the situation, when you can assume that, for example, only letters of English alphabet are used (in which case it would be possible to use an integer shorter then machine memory word).

score -5 · Accepted Answer

私は Lisp にあまり詳しくないので、ここに Haskell のコードをいくつか示しますが、どちらも機能的であるため、翻訳する際に問題になるとは思いません。

doit :: String -> String

doit [] = []
doit (x:xs) = [x] ++ doit (filter (\y -> x /= y) xs)

それで、それは何をしますか？文字列を取得しました。空の文字列 (Haskell [] == "") の場合は、empty文字列を返します。それ以外の場合は、最初のものを取り、文字列elementの再帰に連結しますが、それらの要素は ==です。tailfilterfirst element

この関数filterは、特定ののシンタックスシュガーにすぎません。Lisp では、ここで再読できるようにmap-function呼び出されます: Lisp filter out results from list not matching predicateremove-if

string - 文字列で使用されるアルファベットのリストを取得するための良いまたは効率的な方法は何ですか

簡単に言えば、Common Lisp で文字列から繰り返されない文字のリストを取得する方法は?

3 に答える 3

Related

Reference