ruby - 特定のインライン要素を含むマークアップを (文字列として) 置き換えます

Question

私の意図は、タグ内の文を変更することです。

たとえば、次のように変更します。

<div id="1">
  This is text in the TD with <strong> strong </strong> tags
  <p>This is a child node. with <b> bold </b> tags</p>
  <div id=2>
      "another line of text to a <a href="link.html"> link </a>"
     <p> This is text inside a div <em>inside<em> another div inside a paragraph tag</p>
  </div>
</div>

これに：

<div id="1">
  This is modified text in the TD with <strong> strong </strong> tags
  <p>This is a child node. with <b> bold </b> tags</p>
  <div id=2>
      "another line of text to a <a href="link.html"> link </a>"
      <p> This is text inside a div <em>inside<em> another div inside a paragraph tag</p>
   </div>
</div>

つまり、ノードをトラバースしてタグを取得し、すべてのテキストとスタイルノードを取得する必要がありますが、子タグを取得する必要はありません。文を修正して元に戻します。すべてのコンテンツが変更されるまで、全文を含むタグごとにこれを行う必要があります。

たとえば、テキストノードとスタイルノードを取得すると、div#1「これは強力なタグを持つ TD のテキストです」となりますが、ご覧のとおり、下にある他のテキストは取得されません。変数を介してアクセスおよび変更できる必要があります。

div#1.text_with_formating= "This is modified text in the TD with <strong> strong </strong> tags"

以下のコードは、子タグだけでなくすべてのコンテンツを削除しますdiv#1。したがって、どのように進めればよいかわかりません。

Sanitize.clean(h,{:elements => %w[b em i strong u],:remove_contents=>'true'})

これをどのように解決することをお勧めしますか?

score 1 · Accepted Answer

要素の下にあるすべてのテキストノードを見つけたい場合は、次を使用します。

text_pieces = div.xpath('.//text()')

要素の直接の子であるテキストのみを検索する場合は、次を使用します。

text_pieces = div.xpath('text()')

テキストノードごとに、content好きなように変更できます。my_text_node.content = ...ただし、の代わりに必ず使用する必要がありますmy_text_node.content.gsub!(...)。

# Replace text that is a direct child of an element
def gsub_my_text!( el, find, replace=nil, &block )
    el.xpath('text()').each do |text|
        next if text.content.strip.empty?
        text.content = replace ? text.content.gsub(find,replace,&block) : text.content.gsub(find,&block)
    end
end

# Replace text beneath an element.
def gsub_text!( el, find, replace=nil, &block )
    el.xpath('.//text()').each do |text|
        next if text.content.strip.empty?
        text.content = replace ? text.content.gsub(find,replace,&block) : text.content.gsub(find,&block)
    end
end


d1 = doc.at('#d1')
gsub_my_text!( d1, /[aeiou]+/ ){ |found| found.upcase }

puts d1
#=> <div id="d1">
#=>   ThIs Is tExt In thE TD wIth <strong> strong </strong> tAgs
#=>   <p>This is a child node. with <b> bold </b> tags</p>
#=>   <div id="d2">
#=>       "another line of text to a <a href="link.html"> link </a>"
#=>      <p> This is text inside a div <em>inside<em> another div inside a paragraph tag</em></em></p>
#=>   </div>
#=> </div>


gsub_text!( d1, /\w+/, '(\\0)' )
puts d1
#=> <div id="d1">
#=>   (ThIs) (Is) (tExt) (In) (thE) (TD) (wIth) <strong> (strong) </strong> (tAgs)
#=>   <p>(This) (is) (a) (child) (node). (with) <b> (bold) </b> (tags)</p>
#=>   <div id="d2">
#=>       "(another) (line) (of) (text) (to) (a) <a href="link.html"> (link) </a>"
#=>      <p> (This) (is) (text) (inside) (a) (div) <em>(inside)<em> (another) (div) (inside) (a) (paragraph) (tag)</em></em></p>
#=>   </div>
#=> </div>

編集：これは、テキスト+インラインマークアップの実行を文字列として抽出し、gsubその上で実行し、結果を新しいマークアップに置き換えることができるコードです。

require 'nokogiri'

doc = Nokogiri.HTML '<div id="d1">
  Text with <strong>strong</strong> tag.
  <p>This is a child node. with <b>bold</b> tags.</p>
  <div id=d2>And now we are in <a href="foo">another</a> div.</div>
  Hooray for <em>me!</em>
</div>'

module Enumerable
  # http://stackoverflow.com/q/4800337/405017
  def split_on() chunk{|o|yield(o)||nil}.map{|b,a|b&&a}.compact end
end

require 'set'
# Given a node, call gsub on the `inner_html` 
def gsub_markup!( node, find, replace=nil, &replace_block )
  allowed = Set.new(%w[strong b em i u strike])
  runs  = node.children.split_on{ |el| el.node_type==1 && !allowed.include?(el.name) }
  runs.each do |nodes|
    orig   = nodes.map{ |node| node.node_type==3 ? node.content : node.to_html }.join
    next if orig.strip.empty? # Skip whitespace-only nodes
    result = replace ? orig.gsub(find,replace) : orig.gsub(find,&replace_block)
    puts "I'm replacing #{orig.inspect} with #{result.inspect}" if $DEBUG
    nodes[1..-1].each(&:remove)
    nodes.first.replace(result)
  end
end

d1 = doc.at('#d1')

$DEBUG = true
gsub_markup!( d1, /[aeiou]+/, &:upcase )
#=> I'm replacing "\n  Text with <strong>strong</strong> tag.\n  " with "\n  TExt wIth <strOng>strOng</strOng> tAg.\n  "
#=> I'm replacing "\n  Hooray for <em>me!</em>\n" with "\n  HOOrAy fOr <Em>mE!</Em>\n"

puts doc
#=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
#=> <html><body><div id="d1">
#=>   TExt wIth <strong>strOng</strong> tAg.
#=>   <p>This is a child node. with <b>bold</b> tags.</p>
#=>   <div id="d2">And now we are in <a href="foo">another</a> div.</div>
#=>   HOOrAy fOr <em>mE!</em>
#=> </div></body></html>

score 0 · Accepted Answer

最も簡単な方法は次のとおりです。

div = doc.at('div#1') 
div.replace div.to_s.sub('text', 'modified text')

ruby - 特定のインライン要素を含むマークアップを (文字列として) 置き換えます

2 に答える 2

Related

Reference