java - SAX パーサーで大きな XML ファイルを解析すると、クラスが肥大化して読めなくなります - これを修正するにはどうすればよいですか?

Question

これは純粋にコードの読みやすさに関する質問であり、クラスのパフォーマンスは問題ではありません。

この XMLHandler を構築する方法は次のとおりです。

アプリケーションに関連する各要素について、「ElementName」にブール値があり、解析中の場所に応じて true または false に設定しました: 問題、クラスの先頭に 10 以上のブール宣言があり、それがますます大きくなっています。

私の startElement と endElement メソッドには、数百行の

if (qName = "elementName") {
   ...
} else if (qName = "anotherElementName") {
   ...
}

それらにはさまざまな解析ルールがあります（xmlファイルのこの位置にいる場合はこれを行い、そうでない場合はこれを行います...）

新しい構文解析ルールとデバッグのコーディングは、ますます困難になっています。

sax パーサーをコーディングするためのベストプラクティスは何ですか?また、コードを読みやすくするにはどうすればよいですか?

score 2 · Accepted Answer

ブール変数は何に使用しますか？ネストを追跡するには？

最近、すべての要素に列挙型を使用してこれを実装しました。コードは機能していますが、これは私の頭のてっぺんからの大まかな概算です。

enum Element {
   // special markers:
   ROOT,
   DONT_CARE,

   // Element               tag                  parents
   RootElement(             "root"               ROOT),
   AnElement(               "anelement"),     // DONT_CARE
   AnotherElement(          "anotherelement"),// DONT_CARE
   AChild(                  "child",             AnElement),
   AnotherChild(            "child",             AnotherElement);

   Element() {...}
   Element(String tag, Element ... parents) {...}
}

class MySaxParser extends DefaultHandler {
    Map<Pair<Element, String>, Element> elementMap = buildElementMap();
    LinkedList<Element> nestingStack = new LinkedList<Element>();

    public void startElement(String namespaceURI, String sName, String qName, Attributes attrs) {
        Element parent = nestingStack.isEmpty() ? ROOT : nestingStack.lastElement();
        Element element = elementMap.get(pair(parent, sName));
        if (element == null)
            element = elementMap.get(DONT_CARE, sName);
        if (element == null)
            throw new IllegalStateException("I did not expect <" + sName + "> in this context");

        nestingStack.addLast(element);

        switch (element) {
        case RootElement: ... // Probably don't need cases for many elements at start unless we have attributes
        case AnElement: ...
        case AnotherElement: ...
        case AChild: ...
        case AnotherChild: ...
        default: // Most cases here. Generally nothing to do on startElement
        }
    }
    public void endElement(String namespaceURI, String sName, String qName) {
        // Similar to startElement() but most switch cases do something with the data.
        Element element = nestingStack.removeLast();
        if (!element.tag.equals(sName)) throw IllegalStateException();
        switch (element) {
           ...
        }
    }

    // Construct the structure map from the parent information.
    private Map<Pair<Element, String>, Element> buildElementMap() {
        Map<Pair<Element, String>, Element> result = new LinkedHashMap<Pair<Element, String>, Element>();
        for (Element element: Element.values()) {
            if (element.tag == null) continue;
            if (element.parents.length == 0)
                result.put(pair(DONT_CARE, element.tag), element);
            else for (Element parent: element.parents) {
                result.put(pair(parent, element.tag), element);
            }
        }
        return result;
    }
    // Convenience method to avoid the need for using "new Pair()" with verbose Type parameters 
    private <A,B> Pair<A,B> pair(A a, B b) {
        return new Pair<A, B>(a, b);
    }
    // A simple Pair class, just for completeness.  Better to use an existing implementation.
    private static class Pair<A,B> {
        final A a;
        final B b;
        Pair(A a, B b){ this.a = a; this.b = b;}
        public boolean equals(Object o) {...};
        public int hashCode() {...};
    }
}

編集：
XML構造内の位置は、要素のスタックによって追跡されます。startElementが呼び出されると、適切なElement列挙型は、1）追跡スタックからの親要素と2）Element列挙型の一部として定義された親情報から生成されたマップへのキーとしてsNameパラメーターとして渡された要素タグを使用して決定できます。。このPairクラスは、2つの部分からなるキーの単なるホルダーです。

このアプローチにより、異なるセマンティクスを持つXML構造の異なる部分に繰り返し表示される同じ要素タグを、異なるElement列挙型で表すことができます。例えば：

<root>
  <anelement>
    <child>Data pertaining to child of anelement</child>
  </anelement>      
  <anotherelement>
    <child>Data pertaining to child of anotherelement</child>
  </anotherelement>
</root>

この手法を使用すると、コンテキストを追跡するためにフラグを使用する必要がないため、<child>処理されている要素を知ることができます。コンテキストはElement列挙型定義の一部として宣言され、さまざまな状態変数を排除することで混乱を減らします。

score 0 · Accepted Answer

JAXB または同等のものにフォールバックし、フレームワークに作業を任せます。

score 0 · Accepted Answer

XML 構造に依存します。さまざまなケースのアクションが簡単であるか、(多かれ少なかれ)「独立」している場合は、マップを使用してみることができます。

interface Command {
   public void assemble(Attributes attr, MyStructure myStructure);
}
...

Map<String, Command> commands= new HashMap<String, Command>();
...
if(commands.contains(qName)) {
   commands.get(qname).assemble(attr, myStructur);
} else {
   //unknown qName
}

java - SAX パーサーで大きな XML ファイルを解析すると、クラスが肥大化して読めなくなります - これを修正するにはどうすればよいですか?

3 に答える 3

Related

Reference