1

Can anyone demonstrate how you can base64 decode a particular section of a string using a regex search? I would like the final result to return the entire string but with the base64 areas decoded.

Text between the category tags and subcategory tags should be decoded and then entire strinf should be returned.

<attack_headline><site_id>1</site_id><category>U1FMIEluamVjdGlvbg==</category><subcategory>Q2xhc3NpYyBTUUwgQ29tbWVudCAmcXVvdDstLSZxdW90Ow==</subcategory><client_ip>192.168.1.102</client_ip><date>1363807248</date><gmt_diff>0</gmt_diff><reference_id>E711-3EFB-5F43-5FAC</reference_id></attack_headline>
4

2 に答える 2

1

Based on my comment, here's an example using lxml.etree, which assumes your input is XML (if HTML, use lxml.html instead):

>>> import base64
>>> import lxml.etree
>>> text = "<attack_headline><site_id>1</site_id><category>U1FMIEluamVjdGlvbg==</category><subcategory>Q2xhc3NpYyBTUUwgQ29tbWVudCAmcXVvdDstLSZxdW90Ow==</subcategory><client_ip>192.168.1.102</client_ip><date>1363807248</date><gmt_diff>0</gmt_diff><reference_id>E711-3EFB-5F43-5FAC</reference_id></attack_headline>"
>>> xml = lxml.etree.fromstring(text)
>>> for tag_with_base64 in ('category','subcategory'):
...     node = xml.find(tag_with_base64)
...     if node:
...         node.text = base64.b64decode(node.text)
>>> lxml.etree.tostring(xml)
'<attack_headline><site_id>1</site_id><category>SQL Injection</category><subcategory>Classic SQL Comment &amp;quot;--&amp;quot;</subcategory><client_ip>192.168.1.102</client_ip><date>1363807248</date><gmt_diff>0</gmt_diff><reference_id>E711-3EFB-5F43-5FAC</reference_id></attack_headline>'
于 2013-03-20T19:53:20.220 に答える
0
events = client.service.get_recent_attacks("",epoch_time_last,epoch_time_now,1,"",15)
text = re.sub('(?<!<\/attack_headline>)\s*\n\s*', '',  events)
xml = lxml.etree.fromstring(text)
for tag_with_base64 in ('category','subcategory'):
    node = xml.find(tag_with_base64)
    node.text = base64.b64decode(node.text)
lxml.etree.tostring(xml)
于 2013-03-20T21:12:59.740 に答える