java - StuartMacKay の transform-swf ライブラリを使用して swf からテキストを読み取る

Question

一部の swf ファイルからすべてのテキストを抽出する必要があります。この言語で開発されたモジュールがたくさんあるので、私は Java を使用しています。そこで、SWF ファイルの処理専用の無料の Java ライブラリをすべて Web で検索しました。最後に、StuartMacKayによって開発されたライブラリを見つけました。transform-swfという名前のライブラリは、ここをクリックして GitHub で見つけることができます。

問題は、から esを抽出したら、グリプを文字に変換するにはどうすればよいかということです。GlyphIndexTextSpan

完全に機能し、テストされた例を提供してください。理論的な答えも、「できない」「できない」などの答えも受け入れられません。

私が知っていることと私GlyphIndexがしたことes は、オブジェクトTextTableによって提供されるフォントサイズとフォントの説明を表す整数を繰り返すことによって構築されるを使用して構築されることを知っていましたDefineFont2が、すべての DefineFont2 をデコードすると、すべてが長さゼロの前進。

これが私がしたことです。

//Creating a Movie object from an swf file.
Movie movie = new Movie();
movie.decodeFromFile(new File(out));

//Saving all the decoded DefineFont2 objects.
Map<Integer,DefineFont2> fonts = new HashMap<>();
for (MovieTag object : list) {
  if (object instanceof DefineFont2) {
    DefineFont2 df2 = (DefineFont2) object;
    fonts.put(df2.getIdentifier(), df2);
  }
} 
//Now I retrieve all the texts       
for (MovieTag object : list) {
    if (object instanceof DefineText2) {
        DefineText2 dt2 = (DefineText2) object;
        for (TextSpan ts : dt2.getSpans()) {
            Integer fontIdentifier = ts.getIdentifier();
            if (fontIdentifier != null) {
                int fontSize = ts.getHeight();
                // Here I try to create an object that should
                // reverse the process done by a TextTable
                ReverseTextTable rtt = 
                  new ReverseTextTable(fonts.get(fontIdentifier), fontSize);
                System.out.println(rtt.charactersForText(ts.getCharacters()));
            }
        }
    }
}

クラスは次のReverseTextTableとおりです。

public final class ReverseTextTable {


    private final transient Map<Character, GlyphIndex> characters;
    private final transient Map<GlyphIndex, Character> glyphs;

    public ReverseTextTable(final DefineFont2 font, final int fontSize) {    
        characters = new LinkedHashMap<>();
        glyphs = new LinkedHashMap<>();

        final List<Integer> codes = font.getCodes();
        final List<Integer> advances = font.getAdvances();
        final float scale = fontSize / EMSQUARE;
        final int count = codes.size();

        for (int i = 0; i < count; i++) {
            characters.put((char) codes.get(i).intValue(), new GlyphIndex(i,
                    (int) (advances.get(i) * scale)));
            glyphs.put(new GlyphIndex(i,
                    (int) (advances.get(i) * scale)), (char) codes.get(i).intValue());
        }
    }    

    //This method should reverse from a list of GlyphIndexes to a String
    public String charactersForText(final List<GlyphIndex> list) {
        String text="";
        for(GlyphIndex gi: list){
            text+=glyphs.get(gi);
        }
        return text;
    }        
}

残念ながら、からの進歩のリストDefineFont2は空ReverseTableTextですArrayIndexOutOfBoundException。

score 1 · Accepted Answer

私はたまたま Java で SWF の逆コンパイルに取り組んでいましたが、元のテキストをリバースエンジニアリングして元に戻す方法を考えているときに、この質問に出くわしました。

ソースコードを見た後、私はそれが本当に簡単であることに気付きました。各フォントには、を呼び出すことによって取得できる文字シーケンスが割り当てられていますDefineFont2.getCodes()。glyphIndex は、内の一致する文字のインデックスDefineFont2.getCodes()です。

ただし、1 つの SWF ファイルで複数のフォントが使用されている場合、それぞれの使用を識別する属性がないため、それぞれDefineTextを対応するフォントと一致させることは困難です。DefineFont2DefineFont2DefineText

この問題を回避するために、私は自己学習アルゴリズムを思いつきました。このアルゴリズムはDefineFont2、それぞれの正しいことを推測しDefineText、元のテキストを正しく導き出すことを試みます。

元のテキストをリバースエンジニアリングして戻すために、次のクラスを作成しましたFontLearner。

public class FontLearner {

    private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
    private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();

    /**
     * The same characters from the same font will have similar advance values.
     * This constant defines the allowed difference between two advance values
     * before they are treated as the same character
     */
    private static final int ADVANCE_THRESHOLD = 10;

    /**
     * Some characters have outlier advance values despite being compared
     * to the same character
     * This constant defines the minimum accuracy level for each String
     * before it is associated with the given font
     */
    private static final double ACCURACY_THRESHOLD = 0.9;

    /**
     * This method adds a DefineFont2 to the learner, and a DefineText
     * associated with the font to teach the learner about the given font.
     * 
     * @param font The font to add to the learner
     * @param text The text associated with the font
     */
    private void addFont(DefineFont2 font, DefineText text) {
        fonts.add(font);
        HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
        advancesMap.put(font.getIdentifier(), advances);

        List<Integer> codes = font.getCodes();

        List<TextSpan> spans = text.getSpans();
        for (TextSpan span : spans) {
            List<GlyphIndex> characters = span.getCharacters();
            for (GlyphIndex character : characters) {
                int glyphIndex = character.getGlyphIndex();
                char c = (char) (int) codes.get(glyphIndex);

                int advance = character.getAdvance();
                advances.put(c, advance);
            }
        }
    }

    /**
     * 
     * @param text The DefineText to retrieve the original String from
     * @return The String retrieved from the given DefineText
     */
    public String getString(DefineText text) {
        StringBuilder sb = new StringBuilder();

        List<TextSpan> spans = text.getSpans();

        DefineFont2 font = null;
        for (DefineFont2 getFont : fonts) {
            List<Integer> codes = getFont.getCodes();
            HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
            if (advances == null) {
                advances = new HashMap<Character, Integer>();
                advancesMap.put(getFont.getIdentifier(), advances);
            }

            boolean notFound = true;
            int totalMisses = 0;
            int totalCount = 0;

            for (TextSpan span : spans) {
                List<GlyphIndex> characters = span.getCharacters();
                totalCount += characters.size();

                int misses = 0;
                for (GlyphIndex character : characters) {
                    int glyphIndex = character.getGlyphIndex();
                    if (codes.size() > glyphIndex) {
                        char c = (char) (int) codes.get(glyphIndex);

                        Integer getAdvance = advances.get(c);
                        if (getAdvance != null) {
                            notFound = false;

                            if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
                                misses += 1;
                            }
                        }
                    } else {
                        notFound = false;
                        misses = characters.size();

                        break;
                    }
                }

                totalMisses += misses;
            }

            double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;

            if (accuracy > ACCURACY_THRESHOLD && !notFound) {
                font = getFont;

                // teach this DefineText to the FontLearner if there are
                // any new characters
                for (TextSpan span : spans) {
                    List<GlyphIndex> characters = span.getCharacters();
                    for (GlyphIndex character : characters) {
                        int glyphIndex = character.getGlyphIndex();
                        char c = (char) (int) codes.get(glyphIndex);

                        int advance = character.getAdvance();
                        if (advances.get(c) == null) {
                            advances.put(c, advance);
                        }
                    }
                }
                break;
            }
        }

        if (font != null) {
            List<Integer> codes = font.getCodes();

            for (TextSpan span : spans) {
                List<GlyphIndex> characters = span.getCharacters();
                for (GlyphIndex character : characters) {
                    int glyphIndex = character.getGlyphIndex();
                    char c = (char) (int) codes.get(glyphIndex);
                    sb.append(c);
                }
                sb = new StringBuilder(sb.toString().trim());
                sb.append(" ");
            }
        }

        return sb.toString().trim();
    }
}

使用法：

Movie movie = new Movie();
movie.decodeFromStream(response.getEntity().getContent());

FontLearner learner = new FontLearner();
DefineFont2 font = null;

List<MovieTag> objects = movie.getObjects();
for (MovieTag object : objects) {
if (object instanceof DefineFont2) {
    font = (DefineFont2) object;
} else if (object instanceof DefineText) {
    DefineText text = (DefineText) object;
    if (font != null) {
        learner.addFont(font, text);
        font = null;
    }
    String line = learner.getString(text); // reverse engineers the line
}

この方法により、StuartMacKay の transform-swf ライブラリを使用して元の String のリバースエンジニアリングを 100% 正確に行うことができました。

score 1 · Accepted Answer

正直なところ、Javaでそれを行う方法がわかりません。私はそれが不可能だと主張しているわけではなく、それを行う方法があると信じています. ただし、それを行うライブラリがたくさんあるとおっしゃいました。ライブラリ、つまりswftoolsも提案しました。そのため、そのライブラリに戻ってフラッシュファイルからテキストを抽出することをお勧めします。そのためにはRuntime.exec()、コマンドラインを実行してそのライブラリを実行するだけです。

Apache Commons exec個人的には、JDK でリリースされている標準ライブラリよりも好きです。では、どうすればよいかお見せしましょう。使用する実行ファイルは「swfstrings.exe」です。「」に入れるとしC:\ます。同じフォルダにフラッシュファイルがあるとしますpage.swf。次に、次のコードを試しました (正常に動作します)。

    Path pathToSwfFile = Paths.get("C:\" + File.separator + "page.swf");
    CommandLine commandLine = CommandLine.parse("C:\" + File.separator + "swfstrings.exe");
    commandLine.addArgument("\"" + swfFile.toString() + "\"");
    DefaultExecutor executor = new DefaultExecutor();
    executor.setExitValues(new int[]{0, 1}); //Notice that swfstrings.exe returns 1 for success,
                                            //0 for file not found, -1 for error

    ByteArrayOutputStream stdout = new ByteArrayOutputStream();
    PumpStreamHandler psh = new PumpStreamHandler(stdout);
    executor.setStreamHandler(psh);
    int exitValue;
    try{
        exitValue = executor.execute(commandLine);
    }catch(org.apache.commons.exec.ExecuteException ex){
        psh.stop();
    }
    if(!executor.isFailure(exitValue)){
       String out = stdout.toString("UTF-8"); // here you have the extracted text
    }

これはまさにあなたが要求した答えではありませんが、うまくいきます。

score 0 · Accepted Answer

あなたが達成しようとしていることは難しいようです.ファイルを逆コンパイルしようとしています.それは不可能だと言って申し訳ありません..別の方法でOCRを使用して文字を読み取ろうとする

それを行うソフトウェアがいくつかあります。それに関するフォーラムを確認することもできます。一度コンパイルされたバージョンの swf は非常に難しいためです (私の知る限り不可能です)。必要に応じてこの逆コンパイラをチェックするか、プロジェクトのような他の言語を使用してみてください。

score 0 · Accepted Answer

transform-swfライブラリを使用して、長い文字列で同様の問題が発生しました。

ソースコードを入手してデバッグしました。
クラスに小さなバグがあったと思いますcom.flagstone.transform.coder.SWFDecoder。

行 540 (バージョン 3.0.2 に適用)、変更

宛先 += 長さ;

と

dest += カウント;

それはあなたのためにそれをするはずです（それは文字列を抽出することです）。スチュアートにも知らせた。この問題は、文字列が非常に大きい場合にのみ発生します。

java - StuartMacKay の transform-swf ライブラリを使用して swf からテキストを読み取る

5 に答える 5

Related

Reference