1

I'm working on a tool that parses python source code into a nice html file. Basically, it read a python file line by line, looks at the line to determine what's in it and then adds the right <span> tags with colors, line breaks and whatnot.

I got the general structure of the program, now I'm making all the functions that actually read a string and return an HTML enriched string.

I'm stuck on parsing strings that have quotes in them ie.:

x = 'hello there'  
if x == 'example "quotes" inside quotes' and y == 'another example':    

My work so far has been enumerating a string to get the indices of single-quotes, return them as a list and then two while loops that put the right html tags in the right places. It seemed to work fine when there was a single quote in the string, but all hell broke loose when I introduced two quotes on a line, or quotes inside quotes or finally - a string made up of '\''.

It seems this route is a dead end. I'm now thinking of turning to .split(), shlex, or re and breaking down the string into a list and trying to work with that.
I would really appreciate tips, pointers, and any advice.

Edit: Also, to make it clearer, I need to put HTML tags in the right places in a string. Working with string indices didn't give much results with more complex strings.

4

3 に答える 3

1

組み込みのトークナイザーを使用してPythonソースを色付けするのは、この種のコード(を使用するcgi.escape)の例です。それがあなたのニーズに合うかどうか見てください!

于 2012-09-17T22:26:47.863 に答える
1

使用できますtokenize.generate_tokens

import tokenize
import token
import io

text = '''
x = 'hello there'  
if x == 'example "quotes" inside quotes' and y == 'another example': pass
'''


tokens = tokenize.generate_tokens(io.BytesIO(text).readline)
for toknum, tokval, (srow, scol), (erow, ecol), line in tokens:
    tokname = token.tok_name[toknum]
    print(tokname, tokval)

収量

('NL', '\n')
('NAME', 'x')
('OP', '=')
('STRING', "'hello there'")
('NEWLINE', '\n')
('NAME', 'if')
('NAME', 'x')
('OP', '==')
('STRING', '\'example "quotes" inside quotes\'')
('NAME', 'and')
('NAME', 'y')
('OP', '==')
('STRING', "'another example'")
('OP', ':')
('NAME', 'pass')
('NEWLINE', '\n')
('ENDMARKER', '')

toknameここから、各トークンのタイプ ( ) に基づいて適切な HTML を出力できます。

于 2012-09-17T22:41:43.777 に答える
0

cgi.escapeのようなものがおそらくあなたが望むものです。また、BeautifulSoupやPygmentsのように、作成しているものとまったく同じようなことを行うツールもあります。それらを活用することをお勧めします。

于 2012-09-17T22:30:37.323 に答える