0

perl メソッド内で文字列を取得したが、その時点でそれが特定のエンコーディングであるかどうかがわからず、特定のエンコーディングに変換したい場合、どうすればよいですか?
たとえば、次のようなものです (たとえば、ISO8859 の代わりに UTF-8 にすることもできます)。

sub func {
  my $arg = @_;  
  if($arg not ISO8859) {  
     $arg = Encode::encode("ISO-8859", $arg);  
  }  
  #use $arg    
}   

更新:
以下は正しいですか? (意図は$arg、メソッドに渡されたものに関係なく、それを作成しutf8、それをエンコードしてiso8859、入力に関係なく単一の表現を取得することです)

$arg = Encode::decode("utf8", $arg);  
$arg = Encode::encode("iso-8859-1, $args);  

perldocは、必要なものはカバーされていると言っているようです

4

1 に答える 1

5

Is 80 € or Ђ? Is it even text?

You have to decode inputs in order to do anything with them, and you have to know an input's encoding to decode it.


I don't know at that point if it is a specific encoding or not and want to convert it to a specific encoding how do I do that?

Generally speaking, you can't. How do you expect to instruct decode how to decode it if you don't know what it is?

At best you can use heuristics. The more you know about the input, the better heuristics you can use.

For example, if you know a string is encoded with either UTF-8 or iso-8859-1, then you could guess nearly perfectly which one it is. In fact, you could even decode a file that's a mix of both!

Is the following correct? (the intention is that regardless of what is the $arg that was passed in the method I make it utf8 and then I encode it to iso8859 and get a single representation regardless of input)

No. Those two lines must be provided text encoded using UTF-8. You can't decode something without knowing the encoding that was used to encode it.

于 2013-06-21T06:15:54.927 に答える