ruby-on-rails - Ruby/Railsはテキストから別のタグへの正規表現をスキャン/照合します

Question

次のコンテンツサンプルでは、Stackoverflowで読みやすくするために行を折り返しました（例を見るのに右にスクロールする必要はありません）。

コンテンツA：

"Lorem Ipsum\r\n
[img]http://example.org/first.jpg[/img]\r\n
[img]http://example.org/second.jpg[/img]\r\n
more lorem ipsum ..."

コンテンツB：

"Lorem Ipsum\r\n
[img caption="Sample caption"]http://example.org/third.jpg[/img]
[img]http://example.org/fourth.jpg[/img]"

コンテンツC：

"Lorem Ipsum [img]http://example.org/fifth.jpg[/img]\r\n
more lorem ipsum\r\n\r\n
[img caption="Some other caption"]http://example.org[/img]"

私が試したこと：

content.match(/\[img\]([^<>]*)\[\/img\]/imu)
return example: "[img]...[/img]\r\n[img]...[/img]
content.scan(/\[img\]([^<>]*)\[\/img\]/imu)
return example: "...[/img]\r\n[img]..."

上記の3つのコンテンツ例でscan/match / regexソリューションを実行するときに達成したいのは、すべての発生を取得し、後で使用できるように配列に配置することです[img]...[/img]。[img caption="?"]...[/img]

Array
  1 : A : [img]http://example.org/first.jpg[/img]
  2 : A : [img]http://example.org/second.jpg[/img]
  3 : B : [img caption="Sample caption"]http://example.org/third.jpg[/img]
  4 : B : [img]http://example.org/fourth.jpg[/img]
  5 : C : [img]http://example.org/fifth.jpg[/img]
  6 : C : [img caption="Some other caption"]http://example.org[/img]

また、「削除されたコンテンツ」を、openタグとclosignタグがある場合にのみ制限することも役立ちます。つまり、[img]/が[img caption="?"]あり、その後に欠落している場合は[/img]、無視します。

http://www.ruby-doc.org/core-1.9.3/String.htmlを上下に読んだのですが、これに役立つと思われるものが見つかりません。

アップデート：

だから私はこれを考えました：

\[img([^<>]*)\]([^<>]*)\[\/img\]

次のいずれかが見つかります：

[img]something[/img]

と：

[img caption="something"]something[/img]

今、私はさまざまなコンテンツ内のすべての出来事をキャッチする方法を知る必要があります。いつでも最初から最後の[img][/img]タグまで取得できるので、間に他のLorem Ipsumがあると、それも取得されます。

score 2 · Accepted Answer

/\[img(?:\s+caption=".+")?\].+?\[\/img\]/ドキュメントのスキャンに使用できます。

regex = /\[img(?:\s+caption=".+")?\].+?\[\/img\]/

text = <<EOT
Lorem Ipsum
[img]http://example.org/first.jpg[/img]
[img]http://example.org/second.jpg[/img]
more lorem ipsum ...

Content B:

Lorem Ipsum
[img caption="Sample caption"]http://example.org/third.jpg[/img]
[img]http://example.org/fourth.jpg[/img]

Content C:

Lorem Ipsum [img]http://example.org/fifth.jpg[/img]
more lorem ipsum

[img caption="Some other caption"]http://example.org[/img]
EOT

array = text.scan(regex)
puts array

生成するもの：

[img] http://example.org/first.jpg [/ img]
[img] http://example.org/second.jpg [/ img]
[img caption="サンプルキャプション"]http://example.org/third.jpg[/ img]
[img] http://example.org/fourth.jpg [/ img]
[img] http://example.org/fifth.jpg [/ img]
[img caption="他のキャプション"]http://example.org[/ img]

タグを無視してコンテンツのみを取得する場合は、正規表現を次のように変更します。

regex = /\[img(?:\s+caption=".+")?\](.+?)\[\/img\]/

その変更で再度実行すると、次のようになります。

http://example.org/first.jpg
http://example.org/second.jpg
http://example.org/third.jpg
http://example.org/fourth.jpg
http://example.org/fifth.jpg
http://example.org

（ルーブラプルーフ）

別のタグを探す必要がある場合は、「OR」リストを簡単に生成できます。

Regexp.union(%w[foo img bar])
=> /foo|img|bar/

「魔法の」キャラクターが事前にエスケープされていることを確認する必要がある場合：

Regexp.union(%w[foo img bar].map{ |s| Regexp.escape(s) })

score 1 · Accepted Answer

幸いなことに、私はすでにこれを自分のアプリで解決しました！

それ@tagsが（のような）タグの配列であるとすると["img"]：

regex = /\[(#{@tags.join("|")})\s*(.*?)?\/?\](?:(.*?)\[\/\1\])?/
matches = content.scan(regex)

完全な例：

require 'pp'

@tags = %w(img)
regex = /\[(#{@tags.join("|")})\s*(.*?)?\/?\](?:(.*?)\[\/\1\])?/

content = <<-EOF
  Lorem Ipsum\r\n
  [img]http://example.org/first.jpg[/img]\r\n
  [img]http://example.org/second.jpg[/img]\r\n
  more lorem ipsum ..."
  Content B:

  "Lorem Ipsum\r\n
  [img caption="Sample caption"]http://example.org/third.jpg[/img]
  [img]http://example.org/fourth.jpg[/img]"
  Content C:

  "Lorem Ipsum [img]http://example.org/fifth.jpg[/img]\r\n
  more lorem ipsum\r\n\r\n
  [img caption="Some other caption"]http://example.org[/img]"
EOF

matches = content.scan(regex)
pp matches

そして出力：

[["img", "", "http://example.org/first.jpg"],
 ["img", "", "http://example.org/second.jpg"],
 ["img", "caption=\"Sample caption\"", "http://example.org/third.jpg"],
 ["img", "", "http://example.org/fourth.jpg"],
 ["img", "", "http://example.org/fifth.jpg"],
 ["img", "caption=\"Some other caption\"", "http://example.org"]]

ruby-on-rails - Ruby/Railsはテキストから別のタグへの正規表現をスキャン/照合します

2 に答える 2

Related

Reference