regex - エスケープされた Unicode を elisp に置き換えます

Question

http://www.google.com/dictionary/json?callback=cb&q=word&sl=en&tl=en&restrict=pr%%2Cde&client=te emacs で Google 辞書 API を呼び出すと、次のような応答が得られます。

"entries": [{
    "type": "example",
    "terms": [{
        "type": "text",
        "text": "his grandfather\x27s \x3cem\x3ewords\x3c/em\x3e had been meant kindly",
        "language": "en"
    }]
}]

ご覧のとおり、「テキスト」にはエスケープされたユニコードがあります。以下のような関数に変換したい。

(defun unescape-string (string)
    "Return unescape unicode string"
    ...
)
(unescape-string "his grandfather\x27s \x3cem\x3ewords\x3c/em\x3e")
=> "his grandfathers's <em>words</em>"

(insert #x27)'
(insert #x27)'
(insert #x3c)<
(insert #x3e)>

これが私が試したものです

文字列内の正規表現を置換
http://www.emacswiki.org/emacs/ElispCookbook#toc33のようなカスタム置換

しかし、「\x123」を対応するユニコードに置き換えてバッファまたは文字列にする方法がわからないと思います。

前もって感謝します

score 2 · Accepted Answer

それを行う最も簡単な方法のようです：

(read (princ "\"his grandfather\\x27s \\x3cem\\x3ewords\\x3c/em\\x3e had been meant kindly\""))
;; "his grandfather's ώm>words</em> had been meant kindly"

\x3ceまた、Emacs がではなくparse を実行することも非常に興味深いこと\x3cです。これがバグなのか意図した動作なのかはわかりません。私はいつも、2文字以上読むべきではないと思っていましたx...

それでもread+princの組み合わせを使いたい場合は、次のようにバックスラッシュを付けて、Emacs がそれ以上の文字を解析できないようにする必要があります\x3c\e。または、ここに私が思いつくことができる簡単なものがあります：

(defun replace-c-escape-codes (input)
  (replace-regexp-in-string 
   "\\\\x[[:xdigit:]][[:xdigit:]]"
   (lambda (match)
     (make-string 1 (string-to-number (substring match 2) 16)))
   input))

(replace-c-escape-codes "his grandfather\\x27s \\x3cem\\x3ewords\\x3c/em\\x3e")
"his grandfather's <em>words</em>"

regex - エスケープされた Unicode を elisp に置き換えます

1 に答える 1

Related

Reference