0

I'm looking to fill a python dict with TAG:definition pairs, and I'm using RegExr http://gskinner.com/RegExr/ to write the regex

My first step is to parse a line, from http://www.id3.org/id3v2.3.0, or http://pastebin.com/VJEBGauL and pull out the ID3 tag and the associated definition. For example the first line:

4.20    AENC    [#sec4.20 Audio encryption]

would look like this myDict = {'AENC' : 'Audio encryption'}

To grab the tag name, I've got it looking for at least 3 spaces, then 4 characters, then 4 spaces: {3}[a-zA-Z0-9]{4} {4} That part is easy enough.

The second part, the definition, is not working out for me. So far, I've got (?<=(\[#.+?)) A Which should find, but not include the [# as well as an indeterminded set of characters until it finds: _A, but it's failing. If I remove .+? and replace _A with s it works out alright. What is going wrong? *The underscores represent spaces, which don't show up on SO.

How do I grab the definition, ie,(Audio encryption) of the ID3v2 tag from the line, using RegEx?

edit: Thanks to the answers I got from mVChr, I wrote this for everyone else trying to do the same thing: http://pastebin.com/0nT74dpB

4

2 に答える 2

3

キャプチャグループ()を使用して、必要なものだけを引き出す必要があります。

import re
line = '4.20    AENC    [#sec4.20 Audio encryption]'
full_match = re.search(r'^\S+\s+(\S+)\s+\[#\S+ (.*?)\]', line)
dict_key = full_match.group(1)   # 'AENC'
id3v2_tag = full_match.group(2)  # 'Audio encryption'
于 2012-07-03T21:06:01.213 に答える
2

このようなもの?line.split()最初の部分でどのように使用したかに注目してください。

import re
line = '4.20    AENC    [#sec4.20 Audio encryption]'

_, tag, arguments = line.split(None, 2)
m = re.match(r"\[\S+ (.*)\]", arguments)
myDict[tag] = m.groups()[0]

もちろん、ID3 の文法は非常に単純に見えるので、正規表現なしですべてを行うことができます。

_, tag, arguments = line.split(None, 2)
myDict[tag] = arguments.strip(" []").split(None, 1)[1]

もちろん、 が空の場合、値を に展開しようとするとline、python は aを発生させます。これを解決する 1 つの方法は、上記のコードを実行する前に各行をテストすることです。ValueError_, tag, arguments

for line in file_object:
    if line.strip():
        #the above code here.
于 2012-07-03T21:25:31.893 に答える