0

So I have a string I'm trying to strip some values from. I've been using this regex tester to try to figure this out to no avail: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

This is the string I'm trying to parse:

9   2.27.8.18:2304        63   9dd0e5e7344adac5cf49b7882329df25(OK) Any number of characters follow here

The basic format goes:

INT IP:PORT INT MD5-HASH(OK) STRING

This is as far as I've got so far:

(?<line_id>[0-9]{1,3})(?<ip>.+):(?<port>[0-9]{1,5})(?<guid>.+)\(OK\)(?<name>.+)

And these are the values I've been able to strip so far:

9 (line_id)
2.27.8.18 (ip)
2304 (port)
63   9dd0e5e7344adac5cf49b7882329df25(guid)
Any number of characters follow here (name)

If you try the sample text and pattern I posted above, you can see that I get everything except the integer between the port number and the md5 hash (guid). I'm probably making some amateur mistake as I'm not too experienced with regex patterns so any input would be greatly appreciated.

4

5 に答える 5

2

.+ is generally a bad idea, as it will greedily match any characters in your string.

(?<line_id>[0-9]{1,3})[\s]+(?<ip>[0-9\.]+):(?<port>[0-9]{1,5})[\s]+(?<int>[0-9]{1,5})[\s]+(?<guid>[a-z0-9]+)\(OK\)(?<name>.+)

This yields:

9 (line_id)
2.27.8.18 (ip)
2304 (port)
63 (int)
9dd0e5e7344adac5cf49b7882329df25 (guid)
 Any number of characters follow here (name)
于 2013-01-06T17:09:20.940 に答える
1

The catch for the integer is missing.
I have added here a new named backreference called int.

Try this:

(?<line_id>[0-9]{1,3})(?<ip>.+):(?<port>[0-9]{1,5})\s+(?<int>[0-9]+)\s+(?<guid>.+)\(OK\)(?<name>.+)

now you have the following 6 capturing groups:

line_id group 1: (?[0-9]{1,3})
ip group 2: (?.+)
port group 3: (?[0-9]{1,5})
int group 4: (?[0-9]{1,5})
guid group 5: (?.+)
name group 6: (?.+)

IMHO, the latest two groups are too greedy. Instead of using .+ I'll suggest to identify better the range to characters you need to catch.

于 2013-01-06T17:10:06.717 に答える
1

Try this one

(?<line_id>[0-9]{1,3})\s+(?<ip>.+):(?<port>[0-9]{1,5})\s+(?<number>[0-9]+)\s+(?<guid>.+)\(OK\)(?<name>.+)

Got this result in the test page you provided

has 6 groups:

9 (line_id)
2.27.8.18 (ip)
2304 (port)
63 (number)
9dd0e5e7344adac5cf49b7882329df25 (guid)
Any number of characters follow here (name)

*Note that space used to identify 63

于 2013-01-06T17:10:29.703 に答える
0

You didn't set up a capturing group for that number (63 in your case), which was captured along with the guid. I've edited your pattern a little:

(?<line_id>\d{1,3})\s*(?<ip>.+):(?<port>\d{1,5})\s*(?<number>\d+?)\s*(?<guid>[\da-f]+)\(OK\)(?<name>.+)

Note that I've changed [0-9] sets to \d and the guid's set to: [\da-f] (in case it only uses hexadecimal lowercase characters.

于 2013-01-06T17:10:25.047 に答える
0

Maybe it is easier to check the separator:

(?<line_id>[0-9]{1,3})(?<ip>.+):(?<port>[0-9]{1,5})\s+(?<nr>.*)\s+(?<guid>.+)\(OK\)(?<name>.+)

Here is an example: http://rubular.com/r/qhS7TdTFmn

于 2013-01-06T17:12:09.423 に答える