c# - XPS ドキュメントからテキストを抽出する

Question

XPS ドキュメントから特定のページのテキストを抽出する必要があります。抽出されたテキストは、文字列で記述する必要があります。Microsoft の SpeechLib を使用して抽出されたテキストを読み上げるには、これが必要です。例は C# のみでお願いします。

ありがとう

score 10 · Accepted Answer

ReachFrameworkandWindowsBaseおよび次のusingステートメントへの参照を追加します。

using System.Windows.Xps.Packaging;

次に、次のコードを使用します。

XpsDocument _xpsDocument=new XpsDocument("/path",System.IO.FileAccess.Read);
IXpsFixedDocumentSequenceReader fixedDocSeqReader 
    =_xpsDocument.FixedDocumentSequenceReader;
IXpsFixedDocumentReader _document = fixedDocSeqReader.FixedDocuments[0];
IXpsFixedPageReader _page 
    = _document.FixedPages[documentViewerElement.MasterPageNumber];
StringBuilder _currentText = new StringBuilder();
System.Xml.XmlReader _pageContentReader = _page.XmlReader;
if (_pageContentReader != null)
{
  while (_pageContentReader.Read())
  {
    if (_pageContentReader.Name == "Glyphs")
    {
      if (_pageContentReader.HasAttributes)
      {
        if (_pageContentReader.GetAttribute("UnicodeString") != null )
        {                                   
          _currentText.
            Append(_pageContentReader.
            GetAttribute("UnicodeString"));                              
        }
      }
    }
  }
}
string _fullPageText = _currentText.ToString();

テキストはGlyphs-> UnicodeStringstring 属性に存在します。XMLReader固定ページに使用する必要があります。

c# - XPS ドキュメントからテキストを抽出する

4 に答える 4

Related

Reference