ruby - Nokogiri で XML ファイルを解析してパスを特定する (Ruby)

Question

私のコードは、XML ファイル内の関連するテキストノードの前にあるパスを「推測」することになっています。この場合の関連性とは、反復する product/person/something タグ内にネストされたテキストノードであり、その外部で使用されるテキストノードではないことを意味します。

このコード:

    @doc, items = Nokogiri.XML(@file), []

    path = []
    @doc.traverse do |node|
      if node.class.to_s == "Nokogiri::XML::Element"
        is_path_element = false
        node.children.each do |child|
          is_path_element = true if child.class.to_s == "Nokogiri::XML::Element"
        end
        path.push(node.name) if is_path_element == true && !path.include?(node.name)
      end
    end
    final_path = "/"+path.reverse.join("/")

単純な XML ファイルで機能します。たとえば、次のようになります。

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Some XML file title</title>
    <description>Some XML file description</description>
    <item>
      <title>Some product title</title>
      <brand>Some product brand</brand>
    </item>
    <item>
      <title>Some product title</title>
      <brand>Some product brand</brand>
    </item>
  </channel>
</rss>

puts final_path # => "/rss/channel/item"

しかし、それがより複雑になった場合、どのようにその課題に取り組むべきでしょうか? たとえば、次のようにします。

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Some XML file title</title>
    <description>Some XML file description</description>
    <item>
      <titles>
        <title>Some product title</title>
      </titles>
      <brands>
        <brand>Some product brand</brand>
      </brands>
    </item>
    <item>
      <titles>
        <title>Some product title</title>
      </titles>
      <brands>
        <brand>Some product brand</brand>
      </brands>
    </item>
  </channel>
</rss>

score 3 · Accepted Answer

XML で最も深い「親」パスのリストを探している場合、それを表示する方法は複数あります。

独自のコードを調整して同じ出力を実現できると思いますが、xpath を使用しても同じことが実現できると確信していました。そして私の動機は、XML スキルをさびないようにすることです (Nokogiri はまだ使用していませんが、すぐに専門的に使用する必要があります)。xpath を使用して、その下に子レベルが 1 つだけあるすべての親パスを取得する方法を次に示します。

xml.xpath('//*[child::* and not(child::*/*)]').each { |node| puts node.path }

2 番目のサンプルファイルの出力は次のとおりです。

/rss/channel/item[1]/titles
/rss/channel/item[1]/brands
/rss/channel/item[2]/titles
/rss/channel/item[2]/brands

. . . このリストと gsub をインデックスから取り出し、配列を一意にすると、これはループの出力によく似ています。. .

paths = xml.xpath('//*[child::* and not(child::*/*)]').map { |node| node.path }
paths.map! { |path| path.gsub(/\[[0-9]+\]/,'') }.uniq!
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]

または1行で：

paths = xml.xpath('//*[* and not(*/*)]').map { |node| node.path.gsub(/\[[0-9]+\]/,'') }.uniq
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]

ruby - Nokogiri で XML ファイルを解析してパスを特定する (Ruby)

1 に答える 1

Related

Reference