java - Python でエンコードされた utf-8 文字列 \xc4\x91 Java で

Question

Python で作成された文字列 'Oslobo\xc4\x91enja' から適切な Java 文字列を取得するにはどうすればよいですか? それをデコードする方法は？私はすべてを試してみましたが、どこを見ても、この問題で2日間立ち往生しています。助けてください！

以下は、Google Gson を使用した Java クライアントが JSON を解析する Python の Web サービスメソッドです。

def list_of_suggestions(entry):
   input = entry.encode('utf-8')
   """Returns list of suggestions from auto-complete search"""
   json_result = { 'suggestions': [] }
   resp = urllib2.urlopen('https://maps.googleapis.com/maps/api/place/autocomplete/json?input=' + urllib2.quote(input) + '&location=45.268605,19.852924&radius=3000&components=country:rs&sensor=false&key=blahblahblahblah')
   # make json object from response
   json_resp = json.loads(resp.read())

   if json_resp['status'] == u'OK':
     for pred in json_resp['predictions']:
        if pred['description'].find('Novi Sad') != -1 or pred['description'].find(u'Нови Сад') != -1:
           obj = {}
           obj['name'] = pred['description'].encode('utf-8').encode('string-escape')
           obj['reference'] = pred['reference'].encode('utf-8').encode('string-escape')
           json_result['suggestions'].append(obj)

   return str(json_result)

これがJavaクライアントのソリューションです

private String python2JavaStr(String pythonStr) throws UnsupportedEncodingException {
    int charValue;
    byte[] bytes = pythonStr.getBytes();
    ByteBuffer decodedBytes = ByteBuffer.allocate(pythonStr.length());
    for (int i = 0; i < bytes.length; i++) {
        if (bytes[i] == '\\' && bytes[i + 1] == 'x') {
            // \xc4 => c4 => 196
            charValue = Integer.parseInt(pythonStr.substring(i + 2, i + 4), 16);
            decodedBytes.put((byte) charValue);
            i += 3;
        } else
            decodedBytes.put(bytes[i]);
    }
    return new String(decodedBytes.array(), "UTF-8");
}

score 2 · Accepted Answer

Pythonデータ構造の文字列バージョンを返しています。

代わりに実際の JSON 応答を返します。値を Unicode のままにします。

if json_resp['status'] == u'OK':
    for pred in json_resp['predictions']:
        desc = pred['description'] 
        if u'Novi Sad' in desc or u'Нови Сад' in desc:
            obj = {
                'name': pred['description'],
                'reference': pred['reference']
            }
            json_result['suggestions'].append(obj)

return json.dumps(json_result)

現在、Java は Python エスケープコードを解釈する必要がなく、代わりに有効な JSON を解析できます。

score 1 · Accepted Answer

Python は、UTF-8 バイトを一連の \xVV 値に変換することによって Unicode 文字をエスケープします。ここで、VV はバイトの 16 進値です。これは、1 文字あたり 1 つの \uVVVV である Java Unicode エスケープとは大きく異なります。ここで、VVVV は 16 進 UTF-16 エンコーディングです。

検討：

\xc4\x91

10 進数では、これらの 16 進数値は次のとおりです。

196 145

次に（Javaで）：

byte[] bytes = { (byte) 196, (byte) 145 };
System.out.println("result: " + new String(bytes, "UTF-8"));

プリント:

result: đ

java - Python でエンコードされた utf-8 文字列 \xc4\x91 Java で

2 に答える 2

Related

Reference