-1

Consider this code.

#!/usr/bin/env python
# -*- coding: utf8 -*-

from bs4 import BeautifulSoup
html_doc = """<pre class="code file d"><span class="kw2">import std.stdio
import core.bitop;

// parallel port address 
const uint port = 0x0c000;

void main()
{
    /*
        permission related stuff under linux
    */

    /* data */
    ubyte data = 0b_11111111;
    outp(port, data);
}
</span></pre>
"""

invalid_tags = ['span']

soup = BeautifulSoup(html_doc)

for tag in invalid_tags:
    for invalid in soup.findAll(tag):
        invalid.replaceWithChildren()

pre_tags = soup.find_all('pre')

for i in range (len(pre_tags)):
    pre_tags[i]['class'] = 'prettyprint'

output = soup.prettify(formatter=None)

output_text = output.encode('utf8', 'replace')

output_file = open('test.html', "w")
output_file.write(output_text)
output_file.close()

I have a simple html document. I'd like to remove some unwanted tags <span> in this case and change the class name of <pre> tag.

But if you look at the output file there is unwanted whitespace characters in the second line.

  <pre class="prettyprint">
   import std.stdio
import core.bitop;

// parallel port address 
const uint port = 0x0c000;

void main()
{
    /*
        permission related stuff under linux
    */

    /* data */
    ubyte data = 0b_11111111;
    outp(port, data);
}
  </pre>

I want to remove the unwanted space characters just before the second column and want it to be left aligned.

How can I do that? Any ideas. Thanks..

4

1 に答える 1