3

This is the HTML I am parsing:

<div class="audio" id="audio59779184_153635497_-28469067_16663">
  <table width="100%" cellspacing="0" cellpadding="0"><tbody><tr>
<td>
        <a onclick="playAudioNew('59779184_153635497_-28469067_16663')"><div class="play_new" id="play59779184_153635497_-28469067_16663"></div></a>
        <input id="audio_info59779184_153635497_-28469067_16663" type="hidden" value="http://cs5888.userapi.com/u59779184/audio/0fc0fc5d8799.mp3,245">
</td>
      <td class="info">
        <div class="duration fl_r" onmousedown="if (window.audioPlayer) audioPlayer.switchTimeFormat('59779184_153635497_-28469067_16663', event);">4:05</div>
        <div class="audio_title_wrap">
<b><a href="/search?c%5Bsection%5D=audio&amp;c%5Bq%5D=Don+Omar+feat.+Lucenzo+and+Pallada">Don Omar feat. Lucenzo and Pallada</a></b> – <span id="title59779184_153635497_-28469067_16663"> Danza Kuduro (Dj Fleep Mashup)(21.05.12).ılııllı.♫♪Новая Клубная Музыка♫♪.ıllıılı.http://vkontakte.ru/public28469067 </span>
</div>
      </td>

    </tr></tbody></table>
<div class="player_wrap">
    <div class="playline" id="line59779184_153635497_-28469067_16663"><div></div></div>
    <div class="player" id="player59779184_153635497_-28469067_16663" ondragstart="return false;" onselectstart="return false;">
      <table width="100%" border="0" cellspacing="0" cellpadding="0"><tbody><tr id="audio_tr59779184_153635497_-28469067_16663" valign="top">
<td style="padding: 0px; width: 100%; position: relative;">
            <div class="audio_white_line" id="audio_white_line59779184_153635497_-28469067_16663" onmousedown="audioPlayer.prClick(event);"></div>
            <div class="audio_load_line" id="audio_load_line59779184_153635497_-28469067_16663" onmousedown="audioPlayer.prClick(event);"><!-- --></div>
            <div class="audio_progress_line" id="audio_progress_line59779184_153635497_-28469067_16663" onmousedown="audioPlayer.prClick(event);">
              <div class="audio_pr_slider" id="audio_pr_slider59779184_153635497_-28469067_16663"><!-- --></div>
            </div>
          </td>
          <td id="audio_vol59779184_153635497_-28469067_16663" style="position: relative;"></td>
        </tr></tbody></table>
</div>
  </div>
</div>

And the code I'm using:

require 'watir'
require 'nokogiri'
require 'open-uri'


ff = Watir::Browser.new
ff.goto 'http://vk.com/wall-28469067_16663'
htmlSource = ff.html


doc = Nokogiri::HTML(htmlSource, nil, 'UTF-8')

doc.xpath('//div[@class="audio"]/@id').each do |idSongs|
  divSong = doc.css('div#'+idSongs)
  aa = idSongs.text

  link = doc.xpath("//input[@id='#{aa}']//@value")
  puts link
  puts '========================='
end

ff.close

If I write:

aa = 'audio_info59779184_153625626_-28469067_16663'

puts link returns a good result of "http://cs5333.userapi.com/u14251690/audio/bcf80f297520.mp3,217".

Why is it, if aa = idSongs.text does puts link return " " ?

4

2 に答える 2

2

To answer the question asked, link returns "", because it's an empty NodeSet. In other words, Nokogiri didn't find what you were looking for. A NodeSet behaves like an Array, so when you try to puts an empty array you get "".

Because it's a NodeSet you should iterate over it, as you would an array. (The same is true of your doc.css, which would also return a NodeSet.)

The reason it's empty is because Nokogiri can't find what you want. You're looking for the contents of aa which are:

"audio59779184_153635497_-28469067_16663"

Substituting that into "//input[@id='#{aa}']" gives:

"//input[@id='audio59779184_153635497_-28469067_16663']"

but should be:

"//input[@id='audio_info59779184_153635497_-28469067_16663']"

Searching for that finds content:

 doc.search("//input[@id='audio_info59779184_153635497_-28469067_16663']").size => 1
于 2012-05-22T20:09:14.127 に答える
0

Short answer to: "Why is it, if aa = idSongs.text does puts link return " " ?" Because you're trying to find an input element that has the same dom id as the div you've already matched on, which doesn't exist and therefore Nokogiri just gives you an empty string.

It looks like they reuse the audio identifier in several places, so to make your code more versatile probably extract that out and then prefix your selections with whatever you are needing to access... As such:

doc.xpath('//div[@class="audio"]/@id').each do |idSongs|
  divSong = doc.css('div#'+idSongs)
  aa = idSongs.text

  identifier = (match = aa.match(/^audio(.*)$/)) ? match[1] : ""

  link = doc.xpath("//input[@id='audio_info#{identifier}']//@value")
  puts link
  puts '========================='

  ## now if you want:
  title = doc.xpath("//input[@id='title#{identifier}']//@value")
  puts title
end
于 2012-07-16T23:41:04.093 に答える