python-2.7 - エスケープ文字を分割せずにPython分割文字列

Question

エスケープ文字を分割せずに文字列を分割する方法はありますか? たとえば、文字列があり、「\:」ではなく「:」で分割したいとします。

http\://www.example.url:ftp\://www.example.url

結果は次のようになります。

['http\://www.example.url' , 'ftp\://www.example.url']

score 10 · Accepted Answer

イグナシオが言うように、そうですが、一度に取るに足らないことではありません。問題は、エスケープされた区切り文字にいるかどうかを判断するためにルックバックが必要であり、基本string.splitはその機能を提供しないことです。

これがタイトなループ内になく、パフォーマンスが重大な問題ではない場合は、最初にエスケープされた区切り文字で分割し、次に分割を実行してからマージすることで実行できます。醜いデモコードは次のとおりです。

# Bear in mind this is not rigorously tested!
def escaped_split(s, delim):
    # split by escaped, then by not-escaped
    escaped_delim = '\\'+delim
    sections = [p.split(delim) for p in s.split(escaped_delim)] 
    ret = []
    prev = None
    for parts in sections: # for each list of "real" splits
        if prev is None:
            if len(parts) > 1:
                # Add first item, unless it's also the last in its section
                ret.append(parts[0])
        else:
            # Add the previous last item joined to the first item
            ret.append(escaped_delim.join([prev, parts[0]]))
        for part in parts[1:-1]:
            # Add all the items in the middle
            ret.append(part)
        prev = parts[-1]
    return ret

s = r'http\://www.example.url:ftp\://www.example.url'
print (escaped_split(s, ':')) 
# >>> ['http\\://www.example.url', 'ftp\\://www.example.url']

または、文字列を手で分割するだけで、ロジックに従うのが簡単になる場合があります。

def escaped_split(s, delim):
    ret = []
    current = []
    itr = iter(s)
    for ch in itr:
        if ch == '\\':
            try:
                # skip the next character; it has been escaped!
                current.append('\\')
                current.append(next(itr))
            except StopIteration:
                pass
        elif ch == delim:
            # split! (add current to the list and reset it)
            ret.append(''.join(current))
            current = []
        else:
            current.append(ch)
    ret.append(''.join(current))
    return ret

この 2 番目のバージョンは、二重エスケープの後に区切り文字が続く場合に、わずかに異なる動作をすることに注意してescaped_split(r'a\\:b', ':')ください。そのため、注意が必要です。['a\\\\', 'b']\:

score 5 · Accepted Answer

Python3 との互換性を備えたヘンリーの回答の編集版は、いくつかの問題をテストして修正します。

def split_unescape(s, delim, escape='\\', unescape=True):
    """
    >>> split_unescape('foo,bar', ',')
    ['foo', 'bar']
    >>> split_unescape('foo$,bar', ',', '$')
    ['foo,bar']
    >>> split_unescape('foo$$,bar', ',', '$', unescape=True)
    ['foo$', 'bar']
    >>> split_unescape('foo$$,bar', ',', '$', unescape=False)
    ['foo$$', 'bar']
    >>> split_unescape('foo$', ',', '$', unescape=True)
    ['foo$']
    """
    ret = []
    current = []
    itr = iter(s)
    for ch in itr:
        if ch == escape:
            try:
                # skip the next character; it has been escaped!
                if not unescape:
                    current.append(escape)
                current.append(next(itr))
            except StopIteration:
                if unescape:
                    current.append(escape)
        elif ch == delim:
            # split! (add current to the list and reset it)
            ret.append(''.join(current))
            current = []
        else:
            current.append(ch)
    ret.append(''.join(current))
    return ret

score 4 · Accepted Answer

@ user629923 の提案に基づいていますが、他の回答よりもはるかに単純です。

import re
DBL_ESC = "!double escape!"

s = r"Hello:World\:Goodbye\\:Cruel\\\:World"

map(lambda x: x.replace(DBL_ESC, r'\\'), re.split(r'(?<!\\):', s.replace(r'\\', DBL_ESC)))

score 1 · Accepted Answer

そのための組み込み関数はありません。これは効率的で一般的でテスト済みの関数で、任意の長さの区切り文字もサポートしています。

def escape_split(s, delim):
    i, res, buf = 0, [], ''
    while True:
        j, e = s.find(delim, i), 0
        if j < 0:  # end reached
            return res + [buf + s[i:]]  # add remainder
        while j - e and s[j - e - 1] == '\\':
            e += 1  # number of escapes
        d = e // 2  # number of double escapes
        if e != d * 2:  # odd number of escapes
            buf += s[i:j - d - 1] + s[j]  # add the escaped char
            i = j + 1  # and skip it
            continue  # add more to buf
        res.append(buf + s[i:j - d])
        i, buf = j + len(delim), ''  # start after delim

score 0 · Accepted Answer

Henry Keiterの回答に触発されたこの方法を作成しましたが、次の利点があります。

可変エスケープ文字と区切り文字
実際に何かをエスケープしていない場合は、エスケープ文字を削除しないでください

これはコードです：

def _split_string(self, string: str, delimiter: str, escape: str) -> [str]:
    result = []
    current_element = []
    iterator = iter(string)
    for character in iterator:
        if character == self.release_indicator:
            try:
                next_character = next(iterator)
                if next_character != delimiter and next_character != escape:
                    # Do not copy the escape character if it is inteded to escape either the delimiter or the
                    # escape character itself. Copy the escape character if it is not in use to escape one of these
                    # characters.
                    current_element.append(escape)
                current_element.append(next_character)
            except StopIteration:
                current_element.append(escape)
        elif character == delimiter:
            # split! (add current to the list and reset it)
            result.append(''.join(current_element))
            current_element = []
        else:
            current_element.append(character)
    result.append(''.join(current_element))
    return result

これは、動作を示すテストコードです。

def test_split_string(self):
    # Verify normal behavior
    self.assertListEqual(['A', 'B'], list(self.sut._split_string('A+B', '+', '?')))

    # Verify that escape character escapes the delimiter
    self.assertListEqual(['A+B'], list(self.sut._split_string('A?+B', '+', '?')))

    # Verify that the escape character escapes the escape character
    self.assertListEqual(['A?', 'B'], list(self.sut._split_string('A??+B', '+', '?')))

    # Verify that the escape character is just copied if it doesn't escape the delimiter or escape character
    self.assertListEqual(['A?+B'], list(self.sut._split_string('A?+B', '\'', '?')))

score -4 · Accepted Answer

: は、エスケープが必要な文字ではないようです。

これを達成するために私が考えることができる最も簡単な方法は、文字を分割し、エスケープされたときに元に戻すことです。

サンプルコード（いくつかの整理が必要です。）：

def splitNoEscapes(string, char):
    sections = string.split(char)
    sections = [i + (char if i[-1] == "\\" else "") for i in sections]
    result = ["" for i in sections]
    j = 0
    for s in sections:
        result[j] += s
        j += (1 if s[-1] != char else 0)
    return [i for i in result if i != ""]

python-2.7 - エスケープ文字を分割せずにPython分割文字列

10 に答える 10

Related

Reference