ruby - NokogiriでSAXを使用して内部ノードをトラバースするにはどうすればよいですか？

Question

私はノコギリとルビーに全く慣れておらず、少し助けを求めています。

を使用して非常に大きなXMLファイルを解析していますclass MyDoc < Nokogiri::XML::SAX::Document。次に、ブロックの内側をトラバースします。

これが私のXMLファイルのフォーマットです：

<Content id="83087">
    <Title></Title>
    <PublisherEntity id="1067">eBooksLib</PublisherEntity>
    <Publisher>eBooksLib</Publisher>
    ......
</Content>

「Content」タグが見つかったかどうかはすでにわかりますが、その中をトラバースする方法を知りたいと思います。これが私の短縮コードです：

class MyDoc < Nokogiri::XML::SAX::Document
  #check the start element. set flag for each element
  def start_element name, attrs = []
    if(name == 'Content')
      #get the <Title>
      #get the <PublisherEntity>
      #get the Publisher
    end
  end


  def cdata_block(string)
    characters(string)
  end 

  def characters(str)
    puts str
  end
end

score 2 · Accepted Answer

純粋主義者は私に同意しないかもしれませんが、私がやってきた方法は、Nokogiriを使用して巨大なファイルをトラバースし、XmlSimpleを使用してファイル内の小さなオブジェクトを操作することです。これが私のコードの抜粋です：

require 'nokogiri'
require 'xmlsimple'

def isend(node)
   return (node.node_type == Nokogiri::XML::Reader::TYPE_END_ELEMENT)
end

reader = Nokogiri::XML::Reader(File.open('database.xml', 'r'))

# traverse the file looking for tag "content"
reader.each do |node|
   next if node.name != 'content' || isend(node)
   # if we get here, then we found start of node 'content',
   # so read it into an array and work with the array:
   content = XmlSimple.xml_in(node.outer_xml())
   title = content['title'][0]
   # ...etc.
end

これは私にとって非常にうまくいきます。同じコードでSAXと非SAX（nokogiriとXmlSimple）を混在させることに反対する人もいるかもしれませんが、私の目的では、最小限の手間で作業を完了できます。

score 0 · Accepted Answer

SAXを使うのは難しいです。ソリューションは次のようになる必要があると思います。

class MyDoc < Nokogiri::XML::SAX::Document
  def start_element name, attrs = []
    @inside_content = true if name == 'Content'
    @current_element = name
  end

  def end_element name
    @inside_content = false if name == 'Content'
    @current_element = nil
  end

  def characters str
    puts "#{@current_element} - #{str}" if @inside_content && %w{Title PublisherEntity Publisher}.include?(@current_element)
  end
end

ruby - NokogiriでSAXを使用して内部ノードをトラバースするにはどうすればよいですか？

2 に答える 2

Related

Reference