python - コンマで区切られた文字列の途中にあるアンダースコアで区切られた部分文字列を置き換えます

Question

次のような複数行のファイルがあります。

 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}

1371078139195 (この場合) を別の番号に置き換えたい。置き換えたい値は常に最初のカンマ区切りの単語にあり、常にその単語の最後から 2 番目のアンダースコア区切りの値です。以下は私がこれを行った方法であり、機能しますが、これは見苦しく不器用に思えます。

>>> line="'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> l1=",".join(line.split(",")[1:])
>>> print l1
 {'cf:rv': '0'}
>>> l2=line.split(",")[0]
>>> print l2
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442'
>>> print "_".join(l2.split('_')[:-2])
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight
>>>
>>> print "_".join(l2.split('_')[:-2])+ "_1234567_"+(l2.split('_')[-1])
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442'
>>> print "_".join(l2.split('_')[:-2])+ "_1234567_"+(l2.split('_')[-1]) + "," + l1
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}
>>>

値を (おそらく正規表現を使用して) 置き換える簡単な方法はありますか? これが最善の方法だとは思えない

私にはいくつかの答えがありますが、それが最後から 2 番目に強調された値であることを強調しなければなりません。以下は有効な文字列です。

line = "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
line = "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}"
line = "'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}"

上記の場合、文字列内に最後の 2 番目のアンダースコアの後にない数字文字列があります。また、最後の部分はすべて数字である場合とそうでない場合があります (+14155186442 または 14155186442 の可能性があります)。申し訳ありませんが、これについては上記に言及していません。

あ

score 0 · Accepted Answer

import re

r = re.compile('([^,]*_)(\d+)(?=_[^_,]+,)(_.*)')

for line in ("'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}",
             "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"):
    print line
    print r.sub('\\1ABCDEFG\\3',line)
    print r.sub('\g<1>1234567\\3',line)

結果

'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_ABCDEFG_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}

'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_ABCDEFG_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}

\g<1>「グループ 1」を意味します。ドキュメントを参照してください：

上記の文字エスケープと後方参照に加えて、\g は、(?P...) 構文で定義されているように、name という名前のグループに一致する部分文字列を使用します。\g は対応するグループ番号を使用します。したがって、\g<2> は \2 と同等ですが、\g<2>0 などの置換ではあいまいではありません。\20 は、グループ 2 の後にリテラル文字 '0' が続く参照ではなく、グループ 20 への参照として解釈されます。後方参照 \g<0> は、RE によって一致する部分文字列全体を置き換えます。

python - コンマで区切られた文字列の途中にあるアンダースコアで区切られた部分文字列を置き換えます

5 に答える 5

Related

Reference