0

I have a a previously matched pattern such as:

<a href="somelink here something">

Now I wish to extract only the value of a specific attribute(s) in the tag such but this may be anything an occur anywhere in the tag.

regex_pattern=re.compile('href=\"(.*?)\"') 

Now I can use the above to match the attribute and the value part but I need to extract only the (.*?) part. (Value)

I can ofcourse strip href=" and " later but I'm sure I can use regex properly to extract only the required part.

In simple words I want to match

abcdef=\"______________________\"

in the pattern but want only the

____________________

Part

How do I do this?

4

2 に答える 2

2

Just use re.search('href=\"(.*?)\"', yourtext).group(1) on the matched string yourtext and it will yield the matched group.

于 2012-07-27T08:57:10.633 に答える
1

Take a look at the .group() method on regular expression MatchObject results.

Your regular expression has an explicit group match group (the part in () parethesis), and the .group() method gives you direct access to the string that was matched within that group. MatchObject are returned by several re functions and methods, including the .search() and .finditer() functions.

Demonstration:

>>> import re
>>> example = '<a href="somelink here something">'
>>> regex_pattern=re.compile('href=\"(.*?)\"') 
>>> regex_pattern.search(example)
<_sre.SRE_Match object at 0x1098a2b70>
>>> regex_pattern.search(example).group(1)
'somelink here something'

From the Regular Expression syntax documentation on the (...) parenthesis syntax:

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use \( or \), or enclose them inside a character class: [(] [)].

于 2012-07-27T09:01:43.003 に答える