ruby -

文字列内のタグを分解する方法は?

Question

ブレークタグがたくさんある文字列があります。

残念ながらそれらは不規則です。

 等...

私は nokogiri を使用していますが、ブレークタグごとに文字列を分割するように指示する方法がわかりません....

ありがとう。

score 3 · Accepted Answer

正規表現を中断できる場合は、次の区切り記号を使用します。

<\s*[Bb][Rr]\s*\/*>

説明：

1 つの左山かっこ、0 個以上のスペース、B または b、R または r、0 個以上のスペース、0 個以上のスラッシュ。

正規表現を使用するには、こちらをご覧ください:
http://www.regular-expressions.info/ruby.html

score 2 · Accepted Answer

したがって、iftrue の応答を実装するには:

a = 'a<Br>b<BR>c<br/>d<BR/>e<br />f'
a.split(/<\s*[Bb][Rr]\s*\/*>/)
=> ["a", "b", "c", "d", "e", "f"]

...HTML ブレーク間の文字列のビットの配列が残ります。

score 1 · Accepted Answer

Pestoの99％はそこにありますが、Nokogiriは、宣言内のテキストをラップしないドキュメントフラグメントの作成をサポートしています。

 text = Nokogiri::HTML::DocumentFragment.parse('<Br>this<BR>is<br/>a<BR/>text<br />string').children.select {|n| n.text? and n.content } 
puts text
# >> this
# >> is
# >> a
# >> text
# >> string

score 0 · Accepted Answer

Nokogiri を使用して文字列を解析すると、それをスキャンして、テキスト要素以外を無視できます。

require 'nokogiri'
doc = Nokogiri::HTML.parse('a<Br>b<BR>c<br/>d<BR/>e<br />f')
text = []
doc.search('p').first.children.each do |node|
  text << node.content if node.text?
end
p text  # => ["a", "b", "c", "d", "e", "f"]

Nokogiri は全体をでラップするため、最初の p タグを検索する必要があることに注意してください<!DOCTYPE blah blah><html><body>YOUR TEXT</body></html>。

ruby - 文字列内のタグを分解する方法は?

4 に答える 4

Related

Reference

ruby -

文字列内のタグを分解する方法は?