python - 次の文字列の単純な正規表現

Question

次のような文字列があります

rand_id%3A%3Ftmsid%3D1340496000_EP002960010145_11_0_10050_1_2_10036

今、やりたいことは、

extract timestamp: 134049600
        event: EP002960010145

問題は、tmsid の後に %3D があることです。私はそれについて確信が持てない

上記の文字列からこれら 2 つのフィールドを処理する堅牢な方法はありますか?

ありがとう

score 3 · Accepted Answer

URL で引用されたデータを見ています。

>>> from urllib2 import unquote
>>> unquote('rand_id%3A%3Ftmsid%3D1340496000_EP002960010145_11_0_10050_1_2_10036')
'rand_id:?tmsid=1340496000_EP002960010145_11_0_10050_1_2_10036'

=おそらく最初に分割してから、次のように分割できます_。

>>> unquoted = unquote('rand_id%3A%3Ftmsid%3D1340496000_EP002960010145_11_0_10050_1_2_10036')
>>> unquoted.split('=', 1)[1].split('_')
['1340496000', 'EP002960010145', '11', '0', '10050', '1', '2', '10036']
>>> timestamp, event = unquoted.split('=', 1)[1].split('_')[:2]
>>> timestamp, event
('1340496000', 'EP002960010145')

代わりに、データに複数のフィールドがあり&、そこにもが見つかった場合は、疑問符の後のすべてを URL クエリ文字列として解析することをお勧めします。urlparse.parse_qs()

>>> from urlparse import parse_qs
>>> parse_qs(unquoted.split('?', 1)[1])
{'tmsid': ['1340496000_EP002960010145_11_0_10050_1_2_10036']}
>>> parsed = parse_qs(unquoted.split('?', 1)[1])
>>> timestamp, event = parsed['tmsid'][0].split('_', 2)[:2]
>>> timestamp, event
('1340496000', 'EP002960010145')

python - 次の文字列の単純な正規表現

1 に答える 1

Related

Reference