forms - フォームからのようなエンコードされた unicode utf-8 文字列をデコードする機能はありますか?

Question

HTMLフォームとRebol cgiでデータを保存したい。私のフォームは次のようになります。

<form action="test.cgi" method="post" >

     Input:

     <input type="text" name="field"/>
     <input type="submit" value="Submit" />

</form>

しかし、中国語のような Unicode 文字の場合、たとえば%E4%BA%BA.

(これは漢字の「人」のためのものです ... Rebol バイナリリテラルとしての UTF-8 形式は#{E4BABA})

システムに関数はありますか、またはこれを直接デコードできる既存のライブラリはありますか? dehex現在、このケースをカバーしていないようです。現在、次のように、パーセント記号を削除して対応するバイナリを構築することにより、これを手動でデコードしています。

data: to-string read system/ports/input
print data

;-- this prints "field=%E4%BA%BA"

k-v: parse data "="
print k-v

;-- this prints ["field" "%E4%BA%BA"]

v: append insert replace/all k-v/2 "%" "" "#{" "}"
print v

;-- This prints "#{E4BABA}" ... a string!, not binary!
;-- LOAD will help construct the corresponding binary
;-- then TO-STRING will decode that binary from UTF-8 to character codepoints

write %test.txt to-string load v

score 3 · Accepted Answer

パーセントでエンコードされた Web フォームデータをエンコード/デコードするAltWebFormというライブラリがあります。

do http://reb4.me/r3/altwebform
load-webform "field=%E4%BA%BA"

ライブラリについては、Rebol and Web Formsで説明されています。

score 2 · Accepted Answer

チケット #1986 に関連しているように見えます。ここでは、これが「バグ」なのか、インターネットが独自の仕様から変更されたのかについて議論されています。

ブラウザからの UTF-8 シーケンスを Unicode として DEHEX に変換させます。

中国語の標準になったものについて具体的な経験があり、検討したい場合は、それが重要です。

余談ですが、上記の特定のケースは、PARSE で次のように処理することもできます。

key-value: {field=%E4%BA%BA}

utf8-bytes: copy #{}

either parse key-value [
    copy field-name to {=}
    skip
    some [
        and {%}
        copy enhexed-byte 3 skip (
            append utf8-bytes dehex enhexed-byte
        )
    ]
] [
    print [field-name {is} to string! utf8-bytes]
] [
    print {Malformed input.}
]

それは出力されます：

field is 人

いくつかのコメントが含まれています：

key-value: {field=%E4%BA%BA}

;-- Generate empty binary value by copying an empty binary literal     
utf8-bytes: copy #{}

either parse key-value [

    ;-- grab field-name as the chars right up to the equals sign
    copy field-name to {=}

    ;-- skip the equal sign as we went up to it, without moving "past" it
    skip

    ;-- apply the enclosed rule SOME (non-zero) number of times
    some [
        ;-- match a percent sign as the immediate next symbol, without
        ;-- advancing the parse position
        and {%}

        ;-- grab the next three chars, starting with %, into enhexed-byte
        copy enhexed-byte 3 skip (

            ;-- If we get to this point in the match rule, this parenthesized
            ;-- expression lets us evaluate non-dialected Rebol code to 
            ;-- append the dehexed byte to our utf8 binary
            append utf8-bytes dehex enhexed-byte
        )
    ]
] [
    print [field-name {is} to string! utf8-bytes]
] [
    print {Malformed input.}
]

(また、「単純な解析」はSPLIT の機能強化を支持して斧を手に入れていることに注意してください。そのため、のようなコードを書くことは、parse data "="代わりにsplit data "="、またはチェックアウトすれば他のクールなバリアントとして表現できます...サンプルはチケットにあります。)

forms - フォームからのようなエンコードされた unicode utf-8 文字列をデコードする機能はありますか?

2 に答える 2

Related

Reference