pdf - how to extract RichMediaContent from PDF using iTextSharp

Question

I need to extract video file which is embedded in pdf file. i could find the video which is in annotation so that i can't save it separately. i need to save this file how do i achieve this?

Ex: iTextSharp - how to open/read/extract a file attachment?

he has extracted attachement like the way i need to extract the video.

here is my code:

 string FileName = AppDomain.CurrentDomain.BaseDirectory + "raven test.pdf";
    PdfReader pdfreader = new PdfReader(FileName);
    PdfDictionary PageDictionary = pdfreader.GetPageN(1);
    PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);       
    if ((Annots == null) || (Annots.Length == 0))
        return;

    foreach (PdfObject oAnnot in Annots.ArrayList)
    {
        PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(oAnnot);

        if (AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.RICHMEDIA))
        {
            if (AnnotationDictionary.Keys.Contains(PdfName.RICHMEDIACONTENT))
            {
                PdfDictionary oRICHContent = AnnotationDictionary.GetAsDict(PdfName.RICHMEDIACONTENT); // here i could see the video embeded but it is in annotation, how do i save this file?
            }
        }

    }

score 1 · Accepted Answer

これについては、 Adobe Supplement to ISO 32000、BaseVersion 1.7、ExtensionLevel 3公式仕様を参照することをお勧めします。null以下は基本的なコードですが、おそらくさらにチェックを入れたいと思うでしょう。質問がある場合は、コメントを参照してください。すべての埋め込み動画が RichMedia 形式を使用しているわけではありません。一部の動画は特別な添付ファイルであるため、すべてを取得できるわけではありません。

PdfReader pdfreader = new PdfReader(FileName);
PdfDictionary PageDictionary = pdfreader.GetPageN(1);
PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
if ((Annots == null) || (Annots.Length == 0))
    return;

foreach (PdfObject oAnnot in Annots.ArrayList) {
    PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(oAnnot);

    //See if the annotation is a rich media annotation
    if (AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.RICHMEDIA)) {
        //See if it has content
        if (AnnotationDictionary.Contains(PdfName.RICHMEDIACONTENT)) {
            //Get the content dictionary
            PdfDictionary RMC = AnnotationDictionary.GetAsDict(PdfName.RICHMEDIACONTENT);
            if (RMC.Contains(PdfName.ASSETS)) {
                //Get the assset sub dictionary if it exists
                PdfDictionary Assets = RMC.GetAsDict(PdfName.ASSETS);
                //Get the names sub array.
                PdfArray names = Assets.GetAsArray(PdfName.NAMES);
                //Make sure it has values
                if (names.ArrayList.Count > 0) {
                    //A single piece of content can have multiple assets. The array returned is in the form {name, IR, name, IR, name, IR...}
                    for (int i = 0; i < names.ArrayList.Count; i++) {
                        //Get the IndirectReference for the current asset
                        PdfIndirectReference ir = (PdfIndirectReference)names.ArrayList[++i];
                        //Get the true object from the main PDF
                        PdfDictionary obj = (PdfDictionary)PdfReader.GetPdfObject(ir);
                        //Get the sub Embedded File object
                        PdfDictionary ef = obj.GetAsDict(PdfName.EF);
                        //Get the filespec sub object
                        PdfIndirectReference fir = (PdfIndirectReference)ef.Get(PdfName.F);
                        //Get the true file stream of the filespec
                        PRStream objStream = (PRStream)PdfReader.GetPdfObject(fir);
                        //Get the raw bytes for the given object
                        byte[] bytes = PdfReader.GetStreamBytes(objStream);
                        //Do something with the bytes here
                    }
                }
            }
        }
    }
}

pdf - how to extract RichMediaContent from PDF using iTextSharp

1 に答える 1

Related

Reference