python - 複数のスペースが引用符の間に表示されない場合は、単一のスペースに置き換えますか？

Question

引用符で囲まれていない限り、複数のスペースを1つのスペースに置き換えたいというユースケースがあります。例えば

オリジナル

this is the first    a   b   c
this is the second    "a      b      c"

後

this is the first a b c
this is the second "a      b      c"

正規表現でうまくいくはずだと思いますが、あまり経験がありません。これが私がすでに持っているコードのいくつかです

import re

str = 'this is the second    "a      b      c"'
# Replace all multiple spaces with single space
print re.sub('\s\s+', '\s', str)

# Doesn't work, but something like this
print re.sub('[\"]^.*\s\s+.*[\"]^, '\s', str)

上記の2つ目が機能しない理由を理解しているので、別のアプローチが必要です。可能であれば、正規表現ソリューションの一部について説明してください。ありがとう

score 1 · Accepted Answer

いいえ"_"substring"

import re
str = 'a    b    c  "d   e   f"'  
str = re.sub(r'("[^"]*")|[ \t]+', lambda m: m.group(1) if m.group(1) else ' ', str)

print(str)
#'a b c "d   e   f"'

正規表現("[^"]*")|[ \t]+は、引用符で囲まれた部分文字列、または 1 つ以上の単一スペースまたはタブのいずれかに一致します。正規表現は最初に引用符で囲まれた部分文字列と一致するため、その中の空白は代替の subpattern と一致することができず、[ \t]+無視されます。

引用符で囲まれた部分文字列に一致するパターンはで囲まれて()いるため、コールバックは一致したかどうかを確認できます。もしそうなら、m.group(1)真実であり、その値は単に返されます。そうでない場合、一致したのは空白であるため、単一のスペースが置換値として返されます。

ラムダなし

def repl(match):
    quoted = match.group(1)
    return quoted if quoted else ' '

str = re.sub(r'("[^"]*")|[ \t]+', repl, str)

score 0 · Accepted Answer

入力や引用符の埋め込みを許可しないなどのその他の警告に関係なく、毎回確実に機能するソリューションが必要な場合は、正規表現や引用符の分割を使用しない単純なパーサーを作成する必要があります。

def parse(s):
    last = ''
    result = ''
    toggle = 0
    for c in s:
        if c == '"' and last != '\\':
            toggle ^= 1
        if c == ' ' and toggle == 0 and last == ' ':
            continue
        result += c
        last = c
    return result

test = r'"  <  >"test   1   2   3 "a \"<   >\"  b  c"'
print test
print parse(test)

python - 複数のスペースが引用符の間に表示されない場合は、単一のスペースに置き換えますか？

2 に答える 2

Related

Reference