python - Python の文字列内の単位から数値を分離する

Question

2GB、17ft などの単位を持つ数値を含む文字列があります。単位から数値を分離し、2 つの異なる文字列を作成したいと思います。それらの間に空白がある場合があります (例: 2 GB)。これは、split(' ') を使用して簡単に行うことができます。

それらが一緒の場合 (たとえば 2GB)、数字ではなく文字が見つかるまですべての文字をテストします。

s='17GB'
number=''
unit=''
for c in s:
    if c.isdigit():
        number+=c
    else:
        unit+=c

それを行うより良い方法はありますか？

ありがとう

score 11 · Accepted Answer

最初の数字以外の文字が見つかったら、ループから抜け出すことができます

for i,c in enumerate(s):
    if not c.isdigit():
        break
number = s[:i]
unit = s[i:].lstrip()

負数と小数がある場合:

numeric = '0123456789-.'
for i,c in enumerate(s):
    if c not in numeric:
        break
number = s[:i]
unit = s[i:].lstrip()

score 7 · Accepted Answer

正規表現を使用して、文字列をグループに分割できます。

>>> import re
>>> p = re.compile('(\d+)\s*(\w+)')
>>> p.match('2GB').groups()
('2', 'GB')
>>> p.match('17 ft').groups()
('17', 'ft')

score 3 · Accepted Answer

tokenize助けられる：

>>> import StringIO
>>> s = StringIO.StringIO('27GB')
>>> for token in tokenize.generate_tokens(s.readline):
...   print token
... 
(2, '27', (1, 0), (1, 2), '27GB')
(1, 'GB', (1, 2), (1, 4), '27GB')
(0, '', (2, 0), (2, 0), '')

score 2 · Accepted Answer

正規表現を使用して、調べたいものをグループ化する必要があります。

import re
s = "17GB"
match = re.match(r"^([1-9][0-9]*)\s*(GB|MB|KB|B)$", s)
if match:
  print "Number: %d, unit: %s" % (int(match.group(1)), match.group(2))

解析したいものに応じて正規表現を変更します。正規表現に慣れていない場合は、優れたチュートリアルサイトをご覧ください。

score 2 · Accepted Answer

これは、正規表現よりも少し寛容なアプローチを使用しています。注: これは、投稿された他のソリューションほどパフォーマンスが高くありません。

def split_units(value):
    """
    >>> split_units("2GB")
    (2.0, 'GB')
    >>> split_units("17 ft")
    (17.0, 'ft')
    >>> split_units("   3.4e-27 frobnitzem ")
    (3.4e-27, 'frobnitzem')
    >>> split_units("9001")
    (9001.0, '')
    >>> split_units("spam sandwhiches")
    (0, 'spam sandwhiches')
    >>> split_units("")
    (0, '')
    """
    units = ""
    number = 0
    while value:
        try:
            number = float(value)
            break
        except ValueError:
            units = value[-1:] + units
            value = value[:-1]
    return number, units.strip()

score 2 · Accepted Answer

2

s='17GB'
for i,c in enumerate(s):
    if not c.isdigit():
        break
number=int(s[:i])
unit=s[i:]

于 2010-02-10T21:10:43.297 に答える

score 2 · Accepted Answer

>>> s="17GB"
>>> ind=map(str.isalpha,s).index(True)
>>> num,suffix=s[:ind],s[ind:]
>>> print num+":"+suffix
17:GB

score 0 · Accepted Answer

0

正規表現を使ってみてはどうですか

http://python.org/doc/1.6/lib/module-regsub.html

于 2010-02-10T21:03:48.083 に答える

score 0 · Accepted Answer

このタスクでは、間違いなく正規表現を使用します。

import re
there = re.compile(r'\s*(\d+)\s*(\S+)')
thematch = there.match(s)
if thematch:
  number, unit = thematch.groups()
else:
  raise ValueError('String %r not in the expected format' % s)

RE パターンでは、 \s「空白」を\d意味し、「数字」を\S意味し、非空白を意味します。*は「0 個以上の先行」を+意味し、「1 個以上の先行」を意味し、括弧は「キャプチャグループ」を囲み、これはgroups()match-object の呼び出しによって返されます。thematch指定された文字列がパターンに対応します: オプションの空白、次に 1 つ以上の数字、次にオプションの空白、次に 1 つ以上の非空白文字)。

score 0 · Accepted Answer

正規表現。

import re

m = re.match(r'\s*(?P<n>[-+]?[.0-9])\s*(?P<u>.*)', s)
if m is None:
  raise ValueError("not a number with units")
number = m.group("n")
unit = m.group("u")

これにより、数値 (整数または固定小数点。科学表記法の "e" を単位の接頭辞から区別するのは非常に困難です) が得られ、オプションの記号と単位が続き、オプションの空白が続きます。

re.compile()試合数が多い場合などにご利用ください。

python - Python の文字列内の単位から数値を分離する

12 に答える 12

Related

Reference