python - オプションのキーと値のペアを使用して文をトークン化する Python

Question

文があり、オプションで同じ行にいくつかのキー/値のペアが続いている文（またはテキスト行）を解析しようとしています。キーと値のペアはオプションであるだけでなく、動的です。次のような結果を探しています。

入力：

"There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

出力：

Values = {'theSentence' : "There was a cow at home.",
          'home' : "mary",
          'cowname' : "betsy",
          'date'= "10-jan-2013"
         }

入力：

"Mike ordered a large hamburger. lastname=Smith store=burgerville"

出力：

Values = {'theSentence' : "Mike ordered a large hamburger.",
          'lastname' : "Smith",
          'store' : "burgerville"
         }

入力：

"Sam is nice."

出力：

Values = {'theSentence' : "Sam is nice."}

入力/指示をありがとう。これが宿題の問題であるように見える文章は知っていますが、私は単なるpython初心者です。おそらく正規表現のソリューションであることは知っていますが、正規表現に関しては私は最善ではありません。

score 4 · Accepted Answer

I'd use re.sub:

import re

s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

d = {}

def add(m):
    d[m.group(1)] = m.group(2)

s = re.sub(r'(\w+)=(\S+)', add, s)
d['theSentence'] = s.strip()

print d

Here's more compact version if you prefer:

d = {}
d['theSentence'] = re.sub(r'(\w+)=(\S+)',
    lambda m: d.setdefault(m.group(1), m.group(2)) and '',
    s).strip()

Or, maybe, findall is a better option:

rx = '(\w+)=(\S+)|(\S.+?)(?=\w+=|$)'
d = {
    a or 'theSentence': (b or c).strip()
    for a, b, c in re.findall(rx, s)
}
print d

score 1 · Accepted Answer

最初のステップは、

inputStr = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
theSentence, others = str.split('.')

あなたは「他人」を壊したくなるでしょう。split() をいじって (渡した引数は、文字列を分割する対象を Python に伝えます)、何ができるか見てみましょう。:)

score 1 · Accepted Answer

文がで終わることが保証されている場合.は、次のアプローチに従うことができます。

>>> testList = inputString.split('.')
>>> Values['theSentence'] = testList[0]+'.'

残りの値については、そのまま実行してください。

>>> for elem in testList[1].split():
        key, val = elem.split('=')
        Values[key] = val

Valuesあなたにいいねを与える

>>> Values
{'date': '10-jan-2013', 'home': 'mary', 'cowname': 'betsy', 'theSentence': 'There was a cow at home.'}
>>> Values2
{'lastname': 'Smith', 'theSentence': 'Mike ordered a large hamburger.', 'store': 'burgerville'}
>>> Values3
{'theSentence': 'Sam is nice.'}

score 1 · Accepted Answer

文と割り当てのペアを分割するドットが 1 つだけであると仮定すると、次のようになります。

input = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
sentence, assignments = input.split(". ")

result = {'theSentence': sentence + "."}
for item in assignments.split():
    key, value = item.split("=")
    result[key] = value

print result

プリント:

{'date': '10-jan-2013', 
 'home': 'mary', 
 'cowname': 'betsy', 
 'theSentence': 'There was a cow at home.'}

score 0 · Accepted Answer

Supposing that the first period separates the sentence from the values, you can use something like this:

#! /usr/bin/python3

a = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

values = (lambda s, tail: (lambda d, kv: (d, d.update (kv) ) ) ( {'theSentence': s}, {k: v for k, v in (x.split ('=') for x in tail.strip ().split (' ') ) } ) ) (*a.split ('.', 1) ) [0]

print (values)

score 0 · Accepted Answer

いつものように、これを行うにはたくさんの方法があります。キー=値のペアを探す正規表現ベースのアプローチは次のとおりです。

import re

sentence = "..."

values = {}
for match in re.finditer("(\w+)=(\S+)", sentence):
    if not values:
        # everything left to the first key/value pair is the sentence                                                                               
        values["theSentence"] = sentence[:match.start()].strip()
    else:
        key, value = match.groups()
        values[key] = value
if not values:
    # no key/value pairs, keep the entire sentence
    values["theSentence"] = sentence

これは、キーが Python スタイルの識別子であり、値が 1 つ以上の空白以外の文字で構成されていることを前提としています。

score 0 · Accepted Answer

仮定=は文自体には現れません。これは、文が..

s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

eq_loc = s.find('=')
if eq_loc > -1:
    meta_loc = s[:eq_loc].rfind(' ')
    s = s[:meta_loc]
    metastr = s[meta_loc + 1:]

    metadict = dict(m.split('=') for m in metastr.split())
else:
    metadict = {}

metadict["theSentence"] = s

python - オプションのキーと値のペアを使用して文をトークン化する Python

9 に答える 9

Related

Reference