python - ドットで区切られた文字列を単語に分割しますが、特殊なケースがあります

Question

次の文字列を分割する簡単な方法があるかどうかはわかりません:

'school.department.classes[cost=15.00].name'

これに：

['school', 'department', 'classes[cost=15.00]', 'name']

注：そのままにしておきたいです'classes[cost=15.00]'。

score 6 · Accepted Answer

>>> import re
>>> text = 'school.department.classes[cost=15.00].name'
>>> re.split(r'\.(?!\d)', text)
['school', 'department', 'classes[cost=15.00]', 'name']

より具体的なバージョン:

>>> re.findall(r'([^.\[]+(?:\[[^\]]+\])?)(?:\.|$)', text)
['school', 'department', 'classes[cost=15.00]', 'name']

詳細:

>>> re.findall(r'''(                      # main group
                    [^  .  \[    ]+       # 1 or more of anything except . or [
                    (?:                   # (non-capture) opitional [x=y,...]
                       \[                 # start [
                       [^   \]   ]+       # 1 or more of any non ]
                       \]                 # end ]
                    )?                    # this group [x=y,...] is optional
                   )                      # end main group
                   (?:\.|$)               # find a dot or the end of string
                ''', text, flags=re.VERBOSE)
['school', 'department', 'classes[cost=15.00]', 'name']

score 2 · Accepted Answer

括弧内のドットをスキップ:

import re
s='school.department.classes[cost=15.00].name'
print re.split(r'[.](?![^][]*\])', s)

出力：

['school', 'department', 'classes[cost=15.00]', 'name']

score 1 · Accepted Answer

これは急いで面倒になる可能性があります。この文字列を単に分割するのではなく、実際に解析する必要がある場合があります。

from pyparsing import (Forward,Suppress,Word,alphas,quotedString,
                        alphanums,Regex,oneOf,Group,delimitedList)


# define some basic punctuation, numerics, operators
LBRACK,RBRACK = map(Suppress, '[]')
ident = Word(alphas+'_',alphanums+'_')
real = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
compOper = oneOf('= != < > <= >=')

# a full reference may be composed of full references, i.e., a recursive
# grammar - forward declare a full reference
fullRef = Forward()

# a value in a filtering expression could be a full ref or numeric literal
value = fullRef | real | integer | quotedString
filterExpr = Group(value + compOper + value)

# a single dotted ref could be one with a bracketed filter expression
# (which we would want to keep together in a group) or just a plain identifier
ref = Group(ident + LBRACK + filterExpr + RBRACK) | ident

# now insert the definition of a fullRef, using '<<' instead of '='
fullRef << delimitedList(ref, '.')

# try it out
s = 'school.department.classes[cost=15.00].name'
print fullRef.parseString(s)
s = 'school[size > 10000].department[school.type="TECHNICAL"].classes[cost=15.00].name'
print fullRef.parseString(s)

版画:

['school', 'department', ['classes', ['cost', '=', 15.0]], 'name']
[['school', ['size', '>', 10000]], ['department', ['school', 'type', '=', '"TECHNICAL"']], ['classes', ['cost', '=', 15.0]], 'name']

(必要に応じて「classes[cost=15.00]」を元に戻すことは難しくありません。)

python - ドットで区切られた文字列を単語に分割しますが、特殊なケースがあります

4 に答える 4

Related

Reference