java - JavaでXSLTにUTF-8を返すにはどうすればよいですか

Question

XSL スクリプトを UTF-8 エンコーディングで動作させようとしています。åäö やギリシャ文字などの文字はゴミのように出てきます。それを機能させる唯一の方法は、結果をファイルに書き込むことです。出力ストリームに書き込むと、ガベージのみが返されます (System.out は機能しますが、ファイルにリダイレクトされたことが原因である可能性があります)。

結果はサーブレットから返される必要があります。これはサーブレットの構成の問題ではないことに注意してください。サーブレットからギリシャ文字でハードコーディングされた文字列を返すことができ、それは正常に機能するため、変換の問題です。

これが私の現在の（簡略化された）コードです。

protected void doGet(final HttpServletRequest request, final HttpServletResponse response) throws ServletException,
IOException {
    try {
        response.setCharacterEncoding("UTF-8");
        response.setContentType("text/html; charset=UTF-8");

        final TransformerFactory factory = this.getFactory();

        final File inFile = new File("infile.xml");
        final File xslFile = new File("template.xsl");
        final File outFile = new File("outfile.html");

        final Templates templates = factory.newTemplates(new StreamSource(xslFile));
        final Transformer transformer = templates.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");

        final InputStream in = new FileInputStream(inFile);
        final StreamSource source = new StreamSource(in);

        final StreamResult result1 = new StreamResult(outFile);
        final StreamResult result2 = new StreamResult(System.out);
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
        final StreamResult result3 = new StreamResult(out);

        //transformer.transform(source, result1);
        //transformer.transform(source, result2);
        transformer.transform(source, result3);

        final Writer writer = response.getWriter();
        writer.write(new String(out.toByteArray()));
        writer.close();
        in.close();

    } catch (final TransformerConfigurationException e) {
        e.printStackTrace();
    } catch (final TransformerException e) {
        e.printStackTrace();
    }
}

また、私の XSL スクリプトには次のものが含まれています。

<xsl:output method="html" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />

これを機能させる正しい方法は何ですか? それが役立つかもしれない場合、私は変換にSaxonを使用しています。

score 6 · Accepted Answer

これはほぼ間違いなく問題です。

writer.write(new String(out.toByteArray()));

テキストを慎重に UTF-8 としてエンコードした後、プラットフォームの既定の encoding を使用して文字列に変換しています。プラットフォームのデフォルトエンコーディングを使用するコンストラクタとメソッドは、ほとんど使用しないでください。そのエンコーディングを使用したいString場合でも、明示的に行ってください。

とにかくa に書き込むつもりならWriter、なぜ a に書き始めるのByteArrayOutputStreamですか? に直行してみませんWriterか？

ただし、応答の出力ストリーム ( response.getOutputStream()) に直接書き込んで、UTF-8 であることを示すように応答のコンテンツタイプを設定することをお勧めします。

String事前にとして結果を取得したい場合は、を使用することに注意してくださいStringWriter。a に書き込んでからByteArrayOutputStream文字列に変換しても意味がありません。

java - JavaでXSLTにUTF-8を返すにはどうすればよいですか

1 に答える 1

Related

Reference