pdf - PDF で使用されるフォントがボールド/イタリック/プレーンであることを検出する方法

Question

MuPDF ライブラリを使用して PDF からコンテンツを抽出しているときに、フォントフェイスではなくフォント名のみを取得しています。

推測しますか（たとえば、正しい方法ではありませんがフォント名の太字）、または特定のフォントが太字/斜体/プレーンであることを検出する他の方法がありますか。

score 1 · Accepted Answer

itextsharpを使用して font-family 、font color などを抽出しました

public void Extract_inputpdf() {

  text_input_File = string.Empty;

  StringBuilder sb_inputpdf = new StringBuilder();
  PdfReader reader_inputPdf = new PdfReader(path); //read PDF
  for (int i = 0; i <= reader_inputPdf.NumberOfPages; i++) {

    TextWithFont_inputPdf inputpdf = new TextWithFont_inputPdf();
    text_input_File = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader_inputPdf, i, inputpdf);

    sb_inputpdf.Append(text_input_File);
    input_pdf = sb_inputpdf.ToString();
  }
  reader_inputPdf.Close();
  clear();
}

public class TextWithFont_inputPdf: iTextSharp.text.pdf.parser.ITextExtractionStrategy {
  public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo) {

    string curFont = renderInfo.GetFont().PostscriptFontName;
    string divide = curFont;
    string[] fontnames = null;

    //split the words from postscript if u want separate. it will be in this
  }
}
public string GetResultantText() {

  return result.ToString();
}

pdf - PDF で使用されるフォントがボールド/イタリック/プレーンであることを検出する方法

2 に答える 2

Related

Reference