python - 次の文字列を正しく分割する方法は? - パイソン

Question

解析する次のファイルがあります。

Total Virtual Clients       :             10      (1 Machines)
Current Connections         :             10
Total Elapsed Time          :             50 Secs (0 Hrs,0 Mins,50 Secs)

Total Requests              :         337827      (    6687/Sec)
Total Responses             :         337830      (    6687/Sec)
Total Bytes                 :      990388848      (   20571 KB/Sec)
Total Success Connections   :           3346      (      66/Sec)
Total Connect Errors        :              0      (       0/Sec)
Total Socket Errors         :              0      (       0/Sec)
Total I/O Errors            :              0      (       0/Sec)
Total 200 OK                :          33864      (     718/Sec)
Total 30X Redirect          :              0      (       0/Sec)
Total 304 Not Modified      :              0      (       0/Sec)
Total 404 Not Found         :         303966      (    5969/Sec)
Total 500 Server Error      :              0      (       0/Sec)
Total Bad Status            :         303966      (    5969/Sec)

そのため、これらの値をファイルで検索するための解析アルゴリズムがありますが、次の場合は次のようになります。

for data in temp:
     line = data.strip().split()
     print line

tempこれらの値を含む一時バッファはどこにありますか?

['Total', 'I/O', 'Errors', ':', '0', '(', '0/Sec)']
['Total', '200', 'OK', ':', '69807', '(', '864/Sec)']
['Total', '30X', 'Redirect', ':', '0', '(', '0/Sec)']
['Total', '304', 'Not', 'Modified', ':', '0', '(', '0/Sec)']
['Total', '404', 'Not', 'Found', ':', '420953', '(', '5289/Sec)']
['Total', '500', 'Server', 'Error', ':', '0', '(', '0/Sec)']

そして私が欲しかった：

['Total I/O Errors', '0', '0']
['Total 200 OK', '69807', '864']
['Total 30X Redirect', '0', '0']

等々。どうすればそれを達成できますか？

score 4 · Accepted Answer

次のように正規表現を使用できます。

import re
rex = re.compile('([^:]+\S)\s*:\s*(\d+)\s*\(\s*(\d+)/Sec\)')
for line in temp:
    match = rex.match(line)
    if match:
        print match.groups()

それはあなたに与えるでしょう：

['Total Requests', '337827', '6687']
['Total Responses', '337830', '6687']
['Total Success Connections', '3346', '66']
['Total Connect Errors', '0', '0']
['Total Socket Errors', '0', '0']
['Total I/O Errors', '0', '0']
['Total 200 OK', '33864', '718']
['Total 30X Redirect', '0', '0']
['Total 304 Not Modified', '0', '0']
['Total 404 Not Found', '303966', '5969']
['Total 500 Server Error', '0', '0']
['Total Bad Status', '303966', '5969']

「TITLE:NUMBER(NUMBER/Sec)」に対応する行のみに一致することに注意してください。他の行と一致するように表現を調整することもできます。

score 1 · Accepted Answer

正規表現はデータの解析には過剰ですが、固定長フィールドを表現するには便利な方法です。例えば

for data in temp:
    first, second, third = re.match("(.{28}):(.{21})(.*)", data).groups()
    ...

これは、最初のフィールドが 28 文字であることを意味します。「:」をスキップします。次の 21 文字は 2 番目のフィールドで、残りは 3 番目のフィールドです

score 0 · Accepted Answer

空白で分割する代わりに、フォーマット内の他の区切り文字に基づいて分割する必要があります。次のようになります。

for data in temp:
     first, rest = data.split(':')
     second, rest = rest.split('(')
     third, rest = rest.split(')')
     print [x.strip() for x in (first, second, third)]

python - 次の文字列を正しく分割する方法は? - パイソン

3 に答える 3

Related

Reference