python - 正規表現を置き換える正規表現

Question

Pythonコードで文字列を取得するためのこの正規表現があります:

x1 = re.compile('''((?P<unicode>u?)(?P<c1>'|")(?P<data>.+?)(?P<c2>'|"))''')

dataこの正規表現のand c1,c2部分を抽出して、置換文字列を作成したい (if c1 == c2)
次のようなもの:

repl = "u<c1><data><c2>"

これどうやってするの？？
それは1行で、または使用して可能re.subですか?

更新:
私の新しいコード:

x1 = re.compile('''(?P<unicode>u?)(?P<c>'|")(?P<data>.*?)(?P=c)''')
def repl(match):
    if '#' in match.string:
        ### Confused
    return "u%(c)s%(data)s%(c)s" % m.groupdict()

fcode = '\n'.join([re.sub(x1,repl,i) for i in scode.splitlines()])

ここで、コメント内の文字列を変更しない方法を判断するのに問題があります。コメントを無視するにはどうすればよいですか??

score 1 · Accepted Answer

パターンがあるとします:

pattern = r'''(?P<unicode>u?)(?P<c>'|")(?P<data>.*?)(?P=c)''' # did a little tweak

文字列に一致:

m = re.search(pattern, "print('hello')")

あなたが得たもの：

>>> m.groups()
('', '"', 'hello')
>>> m.groupdict()
{'c': '"', 'unicode': '', 'data': 'hello'}

これで、これらを使って好きなことをすることができます:

>>> 'u{c}{data}{c}'.format_map(m.groupdict())
'u"hello"'

Python 2.x を使用している可能性があります。

>>> 'u{c}{data}{c}'.format(**m.groupdict())
'u"hello"'

それともあなたも古いのが好き%

>>> "u%(c)s%(data)s%(c)s" % m.groupdict()
'u"hello"'

編集：

正規表現ソリューションは、一部の状況を正しく処理できません。

だから私は2to3ハックを使用しました（実際には3to2であり、まだすべてを解決することはできません）：

cd /usr/lib/python3.3/lib2to3/fixes/
cp fix_unicode.py fix_unicode33.py

編集fix_unicode33.py

-_literal_re = re.compile(r"[uU][rR]?[\'\"]")
+_literal_re = re.compile(r"[rR]?[\'\"]")

-class FixUnicode(fixer_base.BaseFix):
+class FixUnicode33(fixer_base.BaseFix):

-                new.value = new.value[1:]
+                new.value = 'u' + new.value

今すぐ2to3 --list | grep unicode33出力する必要がありますunicode33

その後、実行できます2to3 -f unicode33 py3files.py。

fix_unicode33.py後に削除することを忘れないでください

注: Python3 ではur"string"、SyntaxError. ここでのロジックは単純で、目標を達成するために変更します。

score 0 · Accepted Answer

私が最終的に得た長いコード。

x1 = re.compile('''(?P<unicode>u?)(?P<c>'|")(?P<data>.*?)(?P=c)''')

def in_string(text,index):
    curr,in_l,in_str,level = '',0,False,[]

    for c in text[:index+1]:
        if c == '"' or c == "'":
            if in_str and curr == c:
                instr = False
                curr = ''
                in_l -= 1
            else:
                instr = True
                curr = c
                in_l += 1
        level.append(in_l)
    return bool(level[index])

def repl(m):
    return "u%(c)s%(data)s%(c)s" % m.groupdict()

def handle_hashes(i):
    if i.count('#') == 1:
        n = i.find('#')
    else:
        n = get_hash_out_of_string(i)
    return re.sub(x1,repl,i[:n]) + i[n:]

def get_hash_out_of_string(i):
    n = i.find('#')
    curr = i[:]
    last = (len(i)-1)-''.join(list(reversed(i))).find('#')
    while in_string(curr,n) and n < last:
        curr = curr[:n]+' '+curr[n+1:]
        n = curr.find('#')
    return n

python - 正規表現を置き換える正規表現

2 に答える 2

Related

Reference