perl - LWP::UserAgent を使用してフォームデータの ISO-8859-1 エンコーディングを強制する方法は?

Question

LWP::UserAgent は、次のように ISO-8859-1 として明示的にエンコードしている場合でも、常にフォームデータを UTF-8 としてエンコードしているようです。

use Encode;
use LWP::UserAgent;
use utf8;

my $ua = LWP::UserAgent->new;
$ua->post('http://localhost:8080/', {
    text => encode("iso-8859-1", 'è'),
});

依頼内容はtext=%C3%A8. 代わりにどのようにèエンコードできますか?%E8

score 2 · Accepted Answer

へへ。:-) これは、過去数十回の Perl リリースにおける Unicode のサポートの増加と、より正確には、モジュール\Cで使用される正規表現機能に関係しています。背景を理解するには、2010 年の perl-unicode に関するこのスレッド (正規表現で \C エスケープを使用しないでください - なぜ使用しないのですか?)を読んでください。URIURI::Escape

なぜURIモジュール？でフォームや URL のエンコーディングを行うために使用されるためですHTTP::Request::Common。

URI一方、特にモジュールが非常に頻繁に使用されるため、この問題がいかにトリッキーであるかを思い出すために私が書いたスクリプトを次に示します。

use 5.010;
use utf8;
# Perl and URI.pm might behave differently when you encode your script in
# Latin1 and drop the utf8 pragma.
use Encode;
use URI;
use Test::More;
use constant C3A8 => 'text=%C3%A8';
use constant   E8 => 'text=%E8';
diag "Perl $^V";
diag "URI.pm $URI::VERSION";
my $chars = 'è';
my $octets = encode 'iso-8859-1', $chars;
my $uri = URI->new('http:');

$uri->query_form( text => $chars );
is $uri->query, C3A8, C3A8;

my @exp;
given ( "$^V $URI::VERSION" ) {
        when ( 'v5.12.3 1.56' ) { @exp = (   E8, C3A8 ) }
        when ( 'v5.10.1 1.54' ) { @exp = ( C3A8, C3A8 ) }
        when ( 'v5.10.1 1.58' ) { @exp = ( C3A8, C3A8 ) }
        default                 { die 'not tested :-)' }
}

$uri->query_form( text => $octets );
is $uri->query, $exp[0], $exp[0];

utf8::upgrade $octets;
$uri->query_form( text => $octets );
is $uri->query, $exp[1], $exp[1];

done_testing;

だから私が得たもの（WindowsとCygwinで）は次のとおりです。

C:\Windows\system32 :: perl \Opt\Cygwin\tmp\uri.pl
# Perl v5.12.3
# URI.pm 1.56
ok 1 - text=%C3%A8
ok 2 - text=%E8
ok 3 - text=%C3%A8
1..3

と：

MiLu@Dago: ~/comp > perl /tmp/uri.pl
# Perl v5.10.1
# URI.pm 1.54
ok 1 - text=%C3%A8
ok 2 - text=%C3%A8
ok 3 - text=%C3%A8
1..3

アップデート

リクエストボディを手作りすることができます：

use utf8;
use Encode;
use LWP::UserAgent;
my $chars = 'ölè';
my $octets = encode( 'iso-8859-1', $chars );
my $body = 'text=' .
        join '',
        map { $o = ord $_; $o < 128 ? $_ : sprintf '%%%X', $o }
        split //, $octets;
my $uri = 'http://localhost:8080/';
my $req = HTTP::Request->new( POST => $uri, [], $body );
print $req->as_string;
my $ua = LWP::UserAgent->new;
my $rsp = $ua->request( $req );
print $rsp->as_string;

score 1 · Accepted Answer

私自身への短い答え:変数名 (つまり「テキスト」) をそのままではなく、引用符で囲みます。

$ua->post('http://localhost:8080/', {
    'text' => encode("iso-8859-1", 'è'),
});

比率: この奇妙な動作は、次の要因の組み合わせによって引き起こされます。

Perl のバグ #68812により、UTF-8 内部フラグがすべてベアワードに設定されていました。これは最新の Perl バージョン (>= 5.12) で修正されました。
URI.pm は、文字を変換する前にキーを値 (つまり、"text=è") に連結するため、キーに内部フラグが設定されている場合、たとえ値をオクテットとして渡したとしても、値は常に UTF-8 に昇格されます。

\C@Lumi が指摘した URI.pm の使用に関するバグが、この特定の問題に影響を与えているとは思いません。

score 1 · Accepted Answer

use strict;
use warnings;
use utf8;  # Script is encoded using UTF-8.

use Encode                qw( encode );
use HTTP::Request::Common qw( POST );  # This is what ->post uses

my $req = POST('http://localhost:8080/', {
    text => encode("iso-8859-1", 'è'),
});

print($req->as_string());

与える

POST http://localhost:8080/
Content-Length: 8
Content-Type: application/x-www-form-urlencoded

text=%E8

UTF-8エンコーディングではなく«è»を渡していますか？その UTF-8 エンコーディングを使用すると、あなたと同じ結果が得られます。

...
my $req = POST('http://localhost:8080/', {
    text => encode("iso-8859-1", encode("UTF-8", 'è')),
});
...

与える

POST http://localhost:8080/
Content-Length: 11
Content-Type: application/x-www-form-urlencoded

text=%C3%A8

perl - LWP::UserAgent を使用してフォームデータの ISO-8859-1 エンコーディングを強制する方法は?

3 に答える 3

Related

Reference