ruby - Rubyでの奇妙なバックスラッシュ置換

Question

私はこのRubyコードを理解していません：

>> puts '\\ <- single backslash'
# \ <- single backslash

>> puts '\\ <- 2x a, because 2 backslashes get replaced'.sub(/\\/, 'aa')
# aa <- 2x a, because two backslashes get replaced

これまでのところ、すべて期待どおりです。しかし、で1を検索し/\\/、でエンコードされた2で置き換えると'\\\\'、なぜこれが得られるのでしょうか。

>> puts '\\ <- only 1 ... replace 1 with 2'.sub(/\\/, '\\\\')
# \ <- only 1 backslash, even though we replace 1 with 2

そして、3をでエンコードすると'\\\\\\'、2しか得られません。

>> puts '\\ <- only 2 ... 1 with 3'.sub(/\\/, '\\\\\\')
# \\ <- 2 backslashes, even though we replace 1 with 3

バックスラッシュが置換文字列に飲み込まれる理由を理解できる人はいますか？これは1.8と1.9で発生します。

score 72 · Accepted Answer

Quick Answer

If you want to sidestep all this confusion, use the much less confusing block syntax. Here is an example that replaces each backslash with 2 backslashes:

"some\\path".gsub('\\') { '\\\\' }

Gruesome Details

The problem is that when using sub (and gsub), without a block, ruby interprets special character sequences in the replacement parameter. Unfortunately, sub uses the backslash as the escape character for these:

\& (the entire regex)
\+ (the last group)
\` (pre-match string)
\' (post-match string)
\0 (same as \&)
\1 (first captured group)
\2 (second captured group)
\\ (a backslash)

Like any escaping, this creates an obvious problem. If you want include the literal value of one of the above sequences (e.g. \1) in the output string you have to escape it. So, to get Hello \1, you need the replacement string to be Hello \\1. And to represent this as a string literal in Ruby, you have to escape those backslashes again like this: "Hello \\\\1"

So, there are two different escaping passes. The first one takes the string literal and creates the internal string value. The second takes that internal string value and replaces the sequences above with the matching data.

If a backslash is not followed by a character that matches one of the above sequences, then the backslash (and character that follows) will pass through unaltered. This is also affects a backslash at the end of the string -- it will pass through unaltered. It's easiest to see this logic in the rubinius code; just look for the to_sub_replacement method in the String class.

Here are some examples of how String#sub is parsing the replacement string:

1 backslash \ (which has a string literal of "\\")

Passes through unaltered because the backslash is at the end of the string and has no characters after it.

Result: \
2 backslashes \\ (which have a string literal of "\\\\")

The pair of backslashes match the escaped backslash sequence (see \\ above) and gets converted into a single backslash.

Result: \
3 backslashes \\\ (which have a string literal of "\\\\\\")

The first two backslashes match the \\ sequence and get converted to a single backslash. Then the final backslash is at the end of the string so it passes through unaltered.

Result: \\
4 backslashes \\\\ (which have a string literal of "\\\\\\\\")

Two pairs of backslashes each match the \\ sequence and get converted to a single backslash.

Result: \\
2 backslashes with character in the middle \a\ (which have a string literal of "\\a\\")

The \a does not match any of the escape sequences so it is allowed to pass through unaltered. The trailing backslash is also allowed through.

Result: \a\

Note: The same result could be obtained from: \\a\\ (with the literal string: "\\\\a\\\\")

In hindsight, this could have been less confusing if String#sub had used a different escape character. Then there wouldn't be the need to double escape all the backslashes.

score 18 · Accepted Answer

バックスラッシュ（\）は正規表現と文字列のエスケープ文字として機能するため、これは問題です。特別な変数\＆を使用して、gsub置換文字列のバックスラッシュの数を減らすことができます。

foo.gsub(/\\/,'\&\&\&') #for some string foo replace each \ with \\\

編集：\＆の値は正規表現の一致からのものであり、この場合は単一の円記号です。

また、エスケープ文字を無効にする文字列を作成する特別な方法があると思いましたが、明らかにそうではありませんでした。これらのいずれも2つのスラッシュを生成しません。

puts "\\"
puts '\\'
puts %q{\\}
puts %Q{\\}
puts """\\"""
puts '''\\'''
puts <<EOF
\\
EOF

score 4 · Accepted Answer

ああ、これをすべて入力した直後に、それ\が置換文字列のグループを参照するために使用されていることに気付きました。\\これは、置換文字列を置き換えるには、置換文字列にリテラルが必要であることを意味していると思います\。リテラルを取得するに\\は4が必要な\ので、1を2に置き換えるには、実際には8（！）が必要です。

# Double every occurrence of \. There's eight backslashes on the right there!
>> puts '\\'.sub(/\\/, '\\\\\\\\')

私が欠けているものは何ですか？より効率的な方法はありますか？

score 4 · Accepted Answer

作成者の2行目のコードに関する少しの混乱を解消します。

あなたが言った：

>> puts '\\ <- 2x a, because 2 backslashes get replaced'.sub(/\\/, 'aa')
# aa <- 2x a, because two backslashes get replaced

ここでは、2つの円記号は置き換えられていません。1つのエスケープされた円記号を2つのa（'aa'）に置き換えています。つまり、を使用.sub(/\\/, 'a')した場合、「a」は1つしか表示されません。

'\\'.sub(/\\/, 'anything') #=> anything

score 2 · Accepted Answer

the pickaxe book mentions this exact problem, actually. here's another alternative (from page 130 of the latest edition)

str = 'a\b\c'               # => "a\b\c"
str.gsub(/\\/) { '\\\\' }   # => "a\\b\\c"

ruby - Rubyでの奇妙なバックスラッシュ置換

5 に答える 5

Quick Answer

Gruesome Details

Related

Reference