Consider this code.
#!/usr/bin/env python
# -*- coding: utf8 -*-
from bs4 import BeautifulSoup
html_doc = """<pre class="code file d"><span class="kw2">import std.stdio
import core.bitop;
// parallel port address
const uint port = 0x0c000;
void main()
{
/*
permission related stuff under linux
*/
/* data */
ubyte data = 0b_11111111;
outp(port, data);
}
</span></pre>
"""
invalid_tags = ['span']
soup = BeautifulSoup(html_doc)
for tag in invalid_tags:
for invalid in soup.findAll(tag):
invalid.replaceWithChildren()
pre_tags = soup.find_all('pre')
for i in range (len(pre_tags)):
pre_tags[i]['class'] = 'prettyprint'
output = soup.prettify(formatter=None)
output_text = output.encode('utf8', 'replace')
output_file = open('test.html', "w")
output_file.write(output_text)
output_file.close()
I have a simple html document. I'd like to remove some unwanted tags <span>
in this case and change the class name of <pre>
tag.
But if you look at the output file there is unwanted whitespace characters in the second line.
<pre class="prettyprint">
import std.stdio
import core.bitop;
// parallel port address
const uint port = 0x0c000;
void main()
{
/*
permission related stuff under linux
*/
/* data */
ubyte data = 0b_11111111;
outp(port, data);
}
</pre>
I want to remove the unwanted space characters just before the second column and want it to be left aligned.
How can I do that? Any ideas. Thanks..