1

次のようなコードがあります。

file = Nokogiri::XML(File.open('file.xml'))
test = file.xpath("//title") #all <title> elements in xml file

次に、試してみると:

puts test.uniq

次のエラーが表示されます。

 undefined method `uniq' for #<Nokogiri::XML::NodeSet:0x000000011b8bf8> 

test配列ではありませんか?そうでない場合、どうすればそれを作成できますか?

testそれ以外の場合、配列から一意の値のみを取得するにはどうすればよいですか?

4

2 に答える 2

7

Is test not an array? If it's not, how do I make it one?

test will be a NodeSet:

Nokogiri::XML('<xml><foo/></xml>').xpath('//foo').class
=> Nokogiri::XML::NodeSet

foo = Nokogiri::XML('<xml><foo/></xml>').xpath('//foo')
=> [#<Nokogiri::XML::Element:0x8109a674 name="foo">]

foo.is_a? Array
=> false

foo.is_a? Enumerable
=> true

To turn it into an array use to_a:

foo.respond_to? :to_a
=> true

However, that's not necessary because it also responds to map, each, and all the normal things we'd expect when iterating an Array because it includes Enumerable. map, by definition, automatically returns an array, so there's the conversion you wondered about in your comments and your question.

foo.methods.sort - Object.methods
=> [:%, :&, :+, :-, :/, :<<, :[], :add_class, :after, :all?, :any?, :at, :at_css, :at_xpath, :attr, :attribute, :before, :children, :chunk, :collect, :collect_concat, :count, :css, :cycle, :delete, :detect, :document, :document=, :drop, :drop_while, :each, :each_cons, :each_entry, :each_slice, :each_with_index, :each_with_object, :empty?, :entries, :filter, :find, :find_all, :find_index, :first, :flat_map, :grep, :group_by, :index, :inject, :inner_html, :inner_text, :last, :length, :map, :max, :max_by, :member?, :min, :min_by, :minmax, :minmax_by, :none?, :one?, :partition, :pop, :push, :reduce, :reject, :remove, :remove_attr, :remove_class, :reverse, :reverse_each, :search, :select, :set, :shift, :size, :slice, :slice_before, :sort, :sort_by, :take, :take_while, :text, :to_a, :to_ary, :to_html, :to_xhtml, :to_xml, :unlink, :wrap, :xpath, :zip, :|]

I suspect the reason uniq isn't implemented is it's very difficult to figure out how to test for uniqueness. A very simple tag, like:

<div class="foo" id="bar">

is functionally the same as:

<div id="bar" class="foo">

but the obvious to_s test will fail because they won't match a string equality test.

The tags would have to be normalized on the fly to put their parameters into the same order, then converted to strings, but what if the class parameter was "foo1 foo2" in the first tag and "foo2 foo1" in the second? Does the uniq code have to dive into specific parameters and reorder them? And, what if the tag is a container, like div is? Should the children of the node also be considered in the uniq test?

I think that's a can of worms most of us would back away from quickly, and those who'd jump into trying to define uniq would learn a very valuable lesson about rabbit holes. Instead, you are free to define uniq as fits your particular application, so it makes sense to you. I think that's a great design decision for Nokogiri's authors.

于 2013-06-06T20:22:35.023 に答える
1

してみてください -

puts test.map(&:text).uniq

それがどのように機能するかを示すために、1 つのサンプル コードを参照してください。

require "nokogiri"

doc = Nokogiri::HTML(<<-EOF) 
<a class = "foo" href = "https://example.com"> Click here </a>
EOF

node = 2.times.map{|n| n = Nokogiri::XML::Node.new('title', doc); n.content = "xxx";n }
node # => [#<Nokogiri::XML::Element:0x4637712 name="title" children=[#<Nokogiri::XML::Text:0x4636efc "xxx">]>, #<Nokogiri::XML::Element:0x4637690 name="title" children=[#<Nokogiri::XML::Text:0x4636218 "xxx">]>]


nodeset = Nokogiri::XML::NodeSet.new(doc,node)
nodeset # => [#<Nokogiri::XML::Element:0x4637712 name="title" children=[#<Nokogiri::XML::Text:0x4636efc "xxx">]>, #<Nokogiri::XML::Element:0x4637690 name="title" children=[#<Nokogiri::XML::Text:0x4636218 "xxx">]>]

nodeset.map{|i| i.text }.uniq # => ["xxx"]
于 2013-06-06T19:44:18.340 に答える