python - Split a string around any characters not specified

Question

I am looking to be able to split a string into a list around anything that is not a numeral or a dot. Currently the split method only provides a way of doing a positive match for split, is a regex the best route to take in this situation?

For example, given the string "10.23, 10.13.21; 10.1 10.5 and 10.23.32" This should return the list ['10.23', '10.13.21', '10.1', '10.5', '10.23.32']

As such I believe the best regex to use in this situation would be... [\d\.]+

Is this the best way to handle such a case?

score 9 · Accepted Answer

In case you are thinking of re.findall: you can use re.split with an inverted version of your regex:

In [1]: import re

In [2]: s = "10.23, 10.13.21; 10.1 10.5 and 10.23.32"

In [3]: re.split(r'[^\d\.]+', s)
Out[3]: ['10.23', '10.13.21', '10.1', '10.5', '10.23.32']

score 2 · Accepted Answer

If you want a solution other than regex, you could use str.translate and translate everything other than '.0123456789' into whitespace and make a call to split()

In [69]: mystr
Out[69]: '10.23, 10.13.21; 10.1 10.5 and 10.23.32'

In [70]: mystr.translate(' '*46 + '. ' + '0123456789' + ' '*198).split()
Out[70]: ['10.23', '10.13.21', '10.1', '10.5', '10.23.32']

Hope this helps

score 2 · Accepted Answer

An arguably better readable form of what @inspectorG4dget proposed:

>>> import string
>>> s = '10.23, 10.13.21; 10.1 10.5 and 10.23.32'
>>> ''.join(c if c in set(string.digits + '.') else ' ' for c in s).split()
['10.23', '10.13.21', '10.1', '10.5', '10.23.32']

This way you can avoid regular expressions, which is often a good idea when you quite easily can.

python - Split a string around any characters not specified

3 に答える 3

Related

Reference