python - さまざまな固有のケースを使用したPythonでの文字列解析

Question

私の目標は、文字列を辞書に変換することです。外観は次のとおりです。

[exploit] => 1
[hits] => 1
[completed] => 1
[is_malware] => 1
[summary] => 26.0@13965: suspicious.warning: object contains JavaScript
76.0@14467: suspicious.obfuscation using eval
76.0@14467: suspicious.obfuscation using String.fromCharCode

[severity] => 4
[engine] => 60

そこで、これを行うためにいくつかの方法を試しました。最初の試みはでしたがsplit、\n[概要]の場合、内容が分割されて機能しないという問題が発生しました。次に、2回目の試行でしsplitたが、=>で分割すると、次のキーのため=>に分割する必要があることがわからないという問題が発生しました。\n基本的に、最終的には次のようになります{exploit：1、hits：1、completed：1....}など。

どんな助けでも大歓迎です。

score 7 · Accepted Answer

re.findallテキストの解析に使用できます。

>>> import re
>>> re.findall('\[([^]]+)\] => (.*?)(?=\n\[|$)', s, re.S)
[('exploit', '1'), ('hits', '1'), ('completed', '1'), ('is_malware', '1'), ('summary', '26.0@13965: suspicious.warning: object contains JavaScript\n76.0@14467: suspicious.obfuscation using eval\n76.0@14467: suspicious.obfuscation using String.fromCharCode\n'), ('severity', '4'), ('engine', '60')]

を呼び出すことにより、これらの値を辞書に入れることができますdict。

>>> dict(re.findall('\[([^]]+)\] => (.*?)(?=\n\[|$)', s, re.S))
{'engine': '60', 'hits': '1', 'severity': '4', 'is_malware': '1', 'summary': '26.0@13965: suspicious.warning: object contains JavaScript\n76.0@14467: suspicious.obfuscation using eval\n76.0@14467: suspicious.obfuscation using String.fromCharCode\n', 'exploit': '1', 'completed': '1'}

score 0 · Accepted Answer

total_string = """\
[exploit] => 1
[hits] => 1
[completed] => 1
[is_malware] => 1
[summary] => 26.0@13965: suspicious.warning: object contains JavaScript
76.0@14467: suspicious.obfuscation using eval
76.0@14467: suspicious.obfuscation using String.fromCharCode

[severity] => 4
[engine] => 60
"""

import re

pattern_RE = '\[([^]]+)\] => (.*?)(?=\n\[|$)'
report_dict = dict(re.findall(pattern_RE, total_string, re.S))

for k, v in report_dict.items():
    print('[{}]: {}'.format(k, v))

print(report_dict)

今あなたが私たちに見せているのはこれですが、改行とキャリッジリターンが隠されている可能性があります。正規表現は、私たちが見ることができるものには問題ないようです。

{   'engine': '60', 
    'hits': '1', 
    'severity': '4', 
    'is_malware': '1', 
    'summary': '(all three captured)',
    'exploit': '1', 
    'completed': '1'
}

したがって、正規表現がこれをキャッチしていない場合、total_stringのrepr（）は、貼り付けたものとわずかに異なる必要があります（末尾の改行など）

python - さまざまな固有のケースを使用したPythonでの文字列解析

2 に答える 2

Related

Reference