parsing - タブ区切り文字列の解析

Question

例として、タブで区切られた文字列をデータのチャンクに分割する方法を理解するのに苦労しています

a1     b1     c1     d1     e1
a2     b2     c2     d2     e2

そして、ファイルの最初の行を読み取り、次のいずれかの文字列を取得します

"a1     b1     c1     d1      e2"

これをa,b,c,d,eの5つの変数に分けるか、リスト(abcde)を作りたいです。何かご意見は？

ありがとう。

score 2 · Accepted Answer

入力文字列の前後に括弧を連結してから、を使用してread-from-stringみてください（質問のclispにタグを付けたので、Common Lispを使用していると思います）。

(setf str "a1   b1      c1      d1      e2")
(print (read-from-string (concatenate 'string "(" str ")")))

score 2 · Accepted Answer

それを回避するためのさらに別の方法（おそらく少し堅牢です）、コールバックが呼び出されたら文字列内の文字を「setf」できるように簡単に変更することもできますが、私はそれをしませんでした。この種の能力は必要ないようでした。また、後者の場合は、マクロを使用したいと思います。

(defun mapc-words (function vector
                  &aux (whites '(#\Space #\Tab #\Newline #\Rubout)))
  "Iterates over string `vector' and calls the `function'
with the non-white characters collected so far.
The white characters are, by default: #\Space, #\Tab
#\Newline and #\Rubout.
`mapc-words' will short-circuit when `function' returns false."
  (do ((i 0 (1+ i))
       (start 0)
       (len 0))
      ((= i (1+ (length vector))))
    (if (or (= i (length vector)) (find (aref vector i) whites))
        (if (> len 0)
            (if (not (funcall function (subseq vector start i)))
                (return-from map-words)
                (setf len 0 start (1+ i)))
            (incf start))
        (incf len))) vector)

(mapc-words
 #'(lambda (word)
     (not
      (format t "word collected: ~s~&" word)))
 "a1     b1     c1     d1     e1
a2     b2     c2     d2     e2")

;; word collected: "a1"
;; word collected: "b1"
;; word collected: "c1"
;; word collected: "d1"
;; word collected: "e1"
;; word collected: "a2"
;; word collected: "b2"
;; word collected: "c2"
;; word collected: "d2"
;; word collected: "e2"

文字列を読みながら変更したい場合に使用できるマクロの例を次に示しますが、私はそれに完全に満足しているわけではないので、誰かがより良いバリアントを思い付くかもしれません。

(defmacro with-words-in-string
    ((word start end
           &aux (whites '(#\Space #\Tab #\Newline #\Rubout)))
     s
     &body body)
  `(do ((,end 0 (1+ ,end))
        (,start 0)
        (,word)
        (len 0))
       ((= ,end (1+ (length ,s))))
     (if (or (= ,end (length ,s)) (find (aref ,s ,end) ',whites))
         (if (> len 0)
             (progn
               (setf ,word (subseq ,s ,start ,end))
               ,@body
               (setf len 0 ,start (1+ ,end)))
             (incf ,start))
         (incf len))))

(with-words-in-string (word start end)
    "a1     b1     c1     d1     e1
a2     b2     c2     d2     e2"
(format t "word: ~s, start: ~s, end: ~s~&" word start end))

score 0 · Accepted Answer

それらがタブ付き（スペースなし）であると仮定すると、これによりリストが作成されます

(defun tokenize-tabbed-line (line)
  (loop 
     for start = 0 then (+ space 1)
     for space = (position #\Tab line :start start)
     for token = (subseq line start space)
     collect token until (not space)))

その結果、次のようになります。

CL-USER> (tokenize-tabbed-line "a1  b1  c1  d1  e1")
("a1" "b1" "c1" "d1" "e1")

parsing - タブ区切り文字列の解析

3 に答える 3

Related

Reference