ruby - のこぎりのすべてのタグの間でテキストをつかむ？

Question

HTMLタグ間のすべてのテキストを取得する最も効率的な方法は何でしょうか？

<div>
<a> hi </a>
....

htmlタグで囲まれたテキストの束。

score 26 · Accepted Answer

26

doc = Nokogiri::HTML(your_html)
doc.xpath("//text()").to_s

于 2009-10-03T05:38:39.810 に答える

score 5 · Accepted Answer

Saxパーサーを使用します。XPathオプションよりもはるかに高速です。

require "nokogiri"

some_html = <<-HTML
<html>
  <head>
    <title>Title!</title>
  </head>
  <body>
    This is the body!
  </body>
</html>
HTML

class TextHandler < Nokogiri::XML::SAX::Document
  def initialize
    @chunks = []
  end

  attr_reader :chunks

  def cdata_block(string)
    characters(string)
  end

  def characters(string)
    @chunks << string.strip if string.strip != ""
  end
end
th = TextHandler.new
parser = Nokogiri::HTML::SAX::Parser.new(th)
parser.parse(some_html)
puts th.chunks.inspect

score 2 · Accepted Answer

2

ただ行う：

doc = Nokogiri::HTML(your_html)
doc.xpath("//text()").text

于 2013-01-06T21:02:10.973 に答える

score 1 · Accepted Answer

このページの質問divのすべてのテキストを取得する方法は次のとおりです。

require 'rubygems'
require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://stackoverflow.com/questions/1512850/grabbing-text-between-all-tags-in-nokogiri"))
puts doc.css("#question").to_s

ruby - のこぎりのすべてのタグの間でテキストをつかむ？

4 に答える 4

Related

Reference