xml - DOM ツリーで兄弟のコンテンツをトークン化する

Question

後順の方法で dom ツリーをトラバースする計画があります。次に、兄弟グループごとに、各深さの兄弟をトラバースするときに、テキストコンテンツ内の要素の数を取得したいと考えています。明確にするために、例を見てみましょう。

<?xml version="1.0" encoding="UTF-8"?> 
       <title text="title1"> 
           <comment1 id="comment1">
               <data1> this is an example</data1>
               <data2> this example tries to do a demo over a dom tree</data2>
           </comment1>
           <comment2 id="comment2">
               <data3> while it' beeing traversing in postorder fashion </data3>
               <data4> hope it works! </data4>
               <data5> :) </data5>
           </comment2>
       </title>

たとえば、データ 1 とデータ 2 をまとめて、データ 3 ～ 5 をまとめて文字数を調べたいとします。これは、ツリーをトラバースしてTFIDF値を計算するためにこれまでに書いたコードですが、前述のように、兄弟のグループごとにTFを個別に見つけたいのですが、手がかりはありますか? 前もって感謝します

lass tree{


    private static int total=0;
    private static double tf=0;
    private static double result=0;
   private static double  TFIDFresult = 0;


   static double TFIDF(int wordcount,String segment,String keyword)


   {

if(segment==null)
    return TFIDFresult;


    StringTokenizer tokenizer =new StringTokenizer(segment) ; 
    while(tokenizer.hasMoreTokens()){
       total++;
        if( tokenizer.nextToken().equals(keyword))
            wordcount++;
        tf= (double) wordcount / total;
       double inverseTF = Math.log10((float) wordcount / 4);
          TFIDFresult = (((double) wordcount / total * inverseTF ));
    }
    return TFIDFresult;
}


public static void check(Node node){
  if (node == null || node.getNodeName() == null)

    return;



  result= TFIDF(total, node.getNodeValue(), "this");

  check(node.getFirstChild());  



System.out.println(node.getNodeValue() != null && node.getNodeValue().trim().length() == 0 ? "" : node);

    check(node.getNextSibling());  

}



    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        File file = new File("d:\\a.xml");
  DocumentBuilderFactory dbf =
  DocumentBuilderFactory.newInstance();
  DocumentBuilder db = dbf.newDocumentBuilder();
  Document document = db.parse(file);
  document.getDocumentElement().normalize();


  Node b=document.getFirstChild();
  check(b);
  System.out.println(result);



    }
}

ps:手動で、何らかの理由で計算内のドキュメントの数が4であると想定しました。

xml - DOM ツリーで兄弟のコンテンツをトークン化する

0 に答える 0

Related

Reference