python - Python 置換機能 [一度置換]

Question

Python で作成しているプログラムについて助けが必要です。

"steak"単語のすべてのインスタンスをtoに置き換えたいと仮定します(ちょうどそれを使用してください...)と同時に、単語のすべてのインスタンスをto"ghost"に置き換えたいとします。次のコードは機能しません。"ghost""steak"

 s="The scary ghost ordered an expensive steak"
 print s
 s=s.replace("steak","ghost")
 s=s.replace("ghost","steak")
 print s

それは印刷します：The scary steak ordered an expensive steak

私が得ようとしているのはThe scary steak ordered an expensive ghost

score 24 · Accepted Answer

ここではおそらく正規表現を使用します。

>>> import re
>>> s = "The scary ghost ordered an expensive steak"
>>> sub_dict = {'ghost':'steak','steak':'ghost'}
>>> regex = '|'.join(sub_dict)
>>> re.sub(regex, lambda m: sub_dict[m.group()], s)
'The scary steak ordered an expensive ghost'

または、コピー/貼り付けできる機能として：

import re
def word_replace(replace_dict,s):
    regex = '|'.join(replace_dict)
    return re.sub(regex, lambda m: replace_dict[m.group()], s)

基本的に、他の単語に置き換えたい単語のマッピングを作成します ( sub_dict)。そのマッピングから正規表現を作成できます。この場合、正規表現は"steak|ghost"(または"ghost|steak"-- 順序は問題ではありません) であり、正規表現エンジンは重複しないシーケンスを見つけて、それに応じてそれらを置き換えるという残りの作業を行います。

役立つ可能性があるいくつかの変更

regex = '|'.join(map(re.escape,replace_dict))-- 正規表現に特殊な正規表現構文 (括弧など) を含めることができます。これにより、特殊文字がエスケープされ、正規表現がリテラルテキストと一致するようになります。
regex = '|'.join(r'\b{0}\b'.format(x) for x in replace_dict)-- 単語の 1 つが別の単語の部分文字列である場合、一致しないことを確認してください。つまり、に変更しますが、には変更しheませsheん。thetshe

score 12 · Accepted Answer

ターゲットの 1 つで文字列を分割し、置換を行い、全体を元に戻します。

pieces = s.split('steak')
s = 'ghost'.join(piece.replace('ghost', 'steak') for piece in pieces)

これは、単語の境界を無視することを含め、まったく同じように機能します。.replace()に変わり"steak ghosts"ます"ghost steaks"。

score 4 · Accepted Answer

単語の 1 つを、テキストに存在しない一時値に名前変更します。非常に大きなテキストの場合、これは最も効率的な方法ではないことに注意してください。そのためには、 a のre.sub方が適切かもしれません。

 s="The scary ghost ordered an expensive steak"
 print s
 s=s.replace("steak","temp")
 s=s.replace("ghost","steak")
 S=s.replace("temp","steak")
 print s

score 2 · Accepted Answer

メソッドで count 変数を使用しますstring.replace()。したがって、コードを使用すると、次のようになります。

s="The scary ghost ordered an expensive steak"
print s
s=s.replace("steak","ghost", 1)
s=s.replace("ghost","steak", 1)
print s

http://docs.python.org/2/library/stdtypes.html

score 1 · Accepted Answer

このようなものはどうですか？オリジナルを分割リストに保存してから、翻訳辞書を作成します。コアコードを短く保ち、翻訳を調整する必要があるときに辞書を調整するだけです。さらに、関数への移植も簡単です。

 def translate_line(s, translation_dict):
    line = []
    for i in s.split():
       # To take account for punctuation, strip all non-alnum from the
       # word before looking up the translation.
       i = ''.join(ch for ch in i if ch.isalnum()]
       line.append(translation_dict.get(i, i))
    return ' '.join(line)


 >>> translate_line("The scary ghost ordered an expensive steak", {'steak': 'ghost', 'ghost': 'steak'})
 'The scary steak ordered an expensive ghost'

score 1 · Accepted Answer

注この質問の閲覧数を考慮して、削除を取り消し、さまざまな種類のテストケース用に書き直しました

回答から4つの競合する実装を検討しました

>>> def sub_noregex(hay):
    """
    The Join and replace routine which outpeforms the regex implementation. This
    version uses generator expression
    """
    return 'steak'.join(e.replace('steak','ghost') for e in hay.split('ghost'))

>>> def sub_regex(hay):
    """
    This is a straight forward regex implementation as suggested by @mgilson
    Note, so that the overheads doesn't add to the cummulative sum, I have placed
    the regex creation routine outside the function
    """
    return re.sub(regex,lambda m:sub_dict[m.group()],hay)

>>> def sub_temp(hay, _uuid = str(uuid4())):
    """
    Similar to Mark Tolonen's implementation but rather used uuid for the temporary string
    value to reduce collission
    """
    hay = hay.replace("steak",_uuid).replace("ghost","steak").replace(_uuid,"steak")
    return hay

>>> def sub_noregex_LC(hay):
    """
    The Join and replace routine which outpeforms the regex implementation. This
    version uses List Comprehension
    """
    return 'steak'.join([e.replace('steak','ghost') for e in hay.split('ghost')])

一般化された timeit 関数

>>> def compare(n, hay):
    foo = {"sub_regex": "re",
           "sub_noregex":"",
           "sub_noregex_LC":"",
           "sub_temp":"",
           }
    stmt = "{}(hay)"
    setup = "from __main__ import hay,"
    for k, v in foo.items():
        t = Timer(stmt = stmt.format(k), setup = setup+ ','.join([k, v] if v else [k]))
        yield t.timeit(n)

そして、一般化されたテストルーチン

>>> def test(*args, **kwargs):
    n = kwargs['repeat']
    print "{:50}{:^15}{:^15}{:^15}{:^15}".format("Test Case", "sub_temp",
                             "sub_noregex ", "sub_regex",
                             "sub_noregex_LC ")
    for hay in args:
        hay, hay_str = hay
        print "{:50}{:15.10}{:15.10}{:15.10}{:15.10}".format(hay_str, *compare(n, hay))

テスト結果は次のとおりです。

>>> test((' '.join(['steak', 'ghost']*1000), "Multiple repeatation of search key"),
         ('garbage '*998 + 'steak ghost', "Single repeatation of search key at the end"),
         ('steak ' + 'garbage '*998 + 'ghost', "Single repeatation of at either end"),
         ("The scary ghost ordered an expensive steak", "Single repeatation for smaller string"),
         repeat = 100000)
Test Case                                            sub_temp     sub_noregex      sub_regex   sub_noregex_LC 
Multiple repeatation of search key                   0.2022748797   0.3517142003   0.4518992298   0.1812594258
Single repeatation of search key at the end          0.2026047957   0.3508259952   0.4399926194   0.1915298898
Single repeatation of at either end                  0.1877455356   0.3561734007   0.4228843986   0.2164233388
Single repeatation for smaller string                0.2061019057   0.3145984487   0.4252060592   0.1989413449
>>>

テスト結果に基づく

非正規表現 LC と一時変数の置換は、一時変数の使用のパフォーマンスに一貫性がありませんが、パフォーマンスが向上します。
LCバージョンは、ジェネレーターと比較してパフォーマンスが優れています（確認済み）
正規表現は 2 倍以上遅くなります (したがって、コードの一部がボトルネックである場合は、実装の変更を再検討することができます)。
正規表現バージョンと非正規表現バージョンは同等に堅牢であり、スケーリングできます

python - Python 置換機能 [一度置換]

6 に答える 6

Related

Reference