php - pdfmark: 生成された PDF ブックマークのタイトルの一部のアクセント付き文字が正しく表示されない

Question

既存の PDF にブックマークを挿入していますが、アクセント付きの "c" に問題があります。例があります（例で使用されている文字セットはUTF-8です）：

$name = "Ruční nářadí";

$name = chr(254).chr(255).iconv('UTF-8', 'UTF-16BE', str_replace(array('(',')','/'),array('\\(','\\)','\\/'),$name));

$fh = fopen('pdfmark.txt', 'w');
fputs($fh, "[/Title ({$name}) /Page 1 /OUT pdfmark\n");
fclose($fh);

$command = "gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf final.pdf pdfmark.txt; mv out.pdf final.pdf";
exec($command);

問題は、アクセント付きčが最終的な PDF のブックマークにĊ(アクセントの異なる大文字) として表示されることです。私は自分の言語 (チェコ語) で使用されている他のアクセント付き文字を試しましたが、これ以外はすべて問題ありません。

この問題を解決する手がかりをありがとう。

編集 (2013-02-01):

使用した GhostScript のバージョンは 9.06 (2012-08-08) です。結果の PDF ファイルを表示するために Adobe Reader 11.0.1 を使用しています。

まだ考え中です...何かPDFで指定されたエンコーディングである必要はありますか？ソース PDF は私の手に負えず、私はそれについて何も知らないからです。その場合、GS または pdfmark を使用してそうする方法はありますか? ブックマークのエンコーディングが Unicode であれば問題ないと思いましたが、間違っているかもしれません。

編集 (2013-02-05):

GS の pdfwrite または Acrobat にバグがあるようです。詳細については、GS のバグ追跡を参照してください。解決後、ここに解決情報を書きます。

score 2 · Accepted Answer

まず、文字列を1つの問題のある文字に単純化することから始めます。次に、pdfmark.txtの文字列を調べて、UTF-16BEで正しくエンコードされているかどうかを確認します。

これが正しいと仮定して、コマンドラインからGhostscriptを実行してみて、それが機能するかどうかを確認してください。そうでない場合は、http：//bugs.ghostscript.comで実行できるバグレポートを開くことができます。これを行う場合は、ソースファイルとコマンドラインを指定してください。

使用しているGhostscriptのバージョンや、作成されたPDFファイルを表示するために使用しているものはわかりません。どちらも便利だろう…。

score 0 · Accepted Answer

バグ追跡の投稿によると、文字列をさまざまな方法でエンコードすることができます（また、新しいバージョン9.08 PRERELEASEをダウンロードするのに役立つ可能性があります）。

$name = "Ruční nářadí";

$name = 'FEFF'.strtoupper(bin2hex(iconv('UTF-8', 'UCS-2BE', str_replace(array('(',')','/'),array('\\(','\\)','\\/'),$name))));

$fh = fopen('pdfmark.txt', 'w');
fputs($fh, "[/Title <{$name}> /Page 1 /OUT pdfmark\n");
fclose($fh);

$command = "gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf final.pdf pdfmark.txt; mv out.pdf final.pdf";
exec($command);

16進形式へのエンコードと、タイトル定義の括弧の違いに注意してください。

score 0 · Accepted Answer

次のコードスニペットは、何をする必要があるかを示しています。

Postscript では、\000 表記を使用して特殊文字にアクセスできます。000 は文字位置です。3 桁の位置は 8 進数で、\350 は 10 進数の位置 232 と 16 進数の位置 E8 に相当します。

あなたが探しているキャラクターは、Ccaron と ccaron です。これらの文字にアクセスできるようにするには、フォントエンコーディングテーブルで定義する必要があります。CEEncoding テーブルは、Adobe の中央ヨーロッパ文字セットです。Postscript にはおそらく CEEncoding がどこかで既に定義されていますが、この例では独自に定義しています。この例のように、好きなエンコーディングを定義できます。Postscript 言語のリファレンスマニュアルは Web で入手でき、使用可能な文字の詳細が記載されています。

この例では、標準の /Helvetica を使用してテスト 1234 を出力し、標準の /Helvetica に基づいて新しいフォント /Helvetica-CE を定義しますが、CEEncoding エンコーディングを使用します。(Ru\350ní) show は、CEEncoding が ccaron として定義する文字 \350 を使用します。楽しみのために、文字 \001 を Ccaron に、\002 をユーロ記号に、\003 を商標記号として再定義して、任意の文字を任意の文字として定義し、次のように出力できることを示しました (testing 4567\001\ 002\003) ショー。すべてのフォントにすべての記号が定義されているわけではありません。記号のないフォントは、スペース文字に置き換えられます。

そして、それはとても簡単です;）

/Helvetica findfont 46 scalefont setfont
100 75 moveto
(testing 1234) show
/CEEncoding [
/.notdef /Ccaron /Euro /trademark /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /space /exclam /quotedbl
/numbersign /dollar /percent /ampersand /quoteright
/parenleft /parenright /asterisk /plus /comma
/minus /period /slash /zero /one
/two /three /four /five /six
/seven /eight /nine /colon /semicolon
/less /equal /greater /question /at
/A /B /C /D /E
/F /G /H /I /J
/K /L /M /N /O
/P /Q /R /S /T
/U /V /W /X /Y
/Z /bracketleft /backslash /bracketright /asciicircum
/underscore /quoteleft /a /b /c
/d /e /f /g /h
/i /j /k /l /m
/n /o /p /q /r
/s /t /u /v /w
/x /y /z /braceleft /bar
/braceright /tilde /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/Sacute /.notdef /.notdef /Zacute /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /sacute /.notdef /.notdef /zacute
/space /.notdef /breve /Lslash /currency
/Aogonek /.notdef /dieresis /.notdef /Scaron
/Scedilla /Tcaron /Zacute /hyphen /Zcaron
/Zdotaccent /degree /aogonek /ogonek /lslash
/acute /lcaron /.notdef /caron /cedilla
/aogonek /scedilla /tcaron /zacute /hungarumlaut
/zcaron /zdotaccent /Racute /Aacute /Acircumflex
/Abreve /Adieresis /Lacute /Cacute /Ccedilla
/Ccaron /Eacute /Eogonek /Edieresis /Ecaron
/Iacute /Icircumflex /Dcaron /Eth /Nacute
/Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis
/multiply /Rcaron /Uring /Uacute /Uhungarumlaut
/Udieresis /Yacute /Tcedilla /germandbls /racute
/aacute /acircumflex /abreve /adieresis /lacute
/cacute /ccedilla /ccaron /eacute /eogonek
/edieresis /ecaron /iacute /icircumflex /dcaron
/eth /nacute /ncaron /oacute /ocircumflex
/ohungarumlaut /odieresis /divide /rcaron /uring
/uacute /uhungarumlaut /udieresis /yacute /tcedilla
/dotaccent
] def

/Helvetica findfont
dup length dict begin
{ 1 index /FID ne
{def}
{pop pop}
ifelse
} forall
/Encoding CEEncoding def
currentdict
end
/Helvetica-CE exch definefont pop
/Helvetica-CE findfont 36 scalefont setfont
100 100 moveto
(\310\350) show
100 150 moveto 
(Ru\350ní) show
100 200 moveto
(testing 4567\001\002\003) show
 showpage

php - pdfmark: 生成された PDF ブックマークのタイトルの一部のアクセント付き文字が正しく表示されない

3 に答える 3

Related

Reference