python - 文字列が数値 (float) かどうかを確認するにはどうすればよいですか?

Question

文字列がPythonで数値として表現できるかどうかを確認する最良の方法は何ですか?

私が現在持っている機能は次のとおりです。

def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

これは、醜くて遅いだけでなく、扱いにくいようにも見えます。float()ただし、メイン関数の呼び出しはさらに悪いため、より良い方法は見つかりませんでした。

score 1708 · Accepted Answer

浮動小数点数ではなく (正の符号なし) 整数を解析する場合はisdigit()、文字列オブジェクトに関数を使用できます。

>>> a = "03523"
>>> a.isdigit()
True
>>> b = "963spam"
>>> b.isdigit()
False

文字列メソッド - isdigit(): Python2、Python3

Unicode 文字列にも何かがありますが、これは Unicode にあまり詳しくありません - 10 進数/10 進数です

score 764 · Accepted Answer

これは、醜くて遅いだけではありません

私は両方に異議を唱えます。

正規表現やその他の文字列解析方法は、見にくく遅くなります。

上記よりもはるかに高速なものがあるかどうかはわかりません。関数を呼び出して戻ります。Try/Catch では、スタックフレームを広範囲に検索せずに最も一般的な例外がキャッチされるため、オーバーヘッドはあまり発生しません。

問題は、数値変換関数には 2 種類の結果があることです。

番号が有効な場合は番号
有効な数値を解析できなかったことを示すステータスコード (たとえば、errno 経由) または例外。

C（例として）は、これをさまざまな方法でハックします。Python はそれを明確かつ明示的にレイアウトします。

これを行うためのコードは完璧だと思います。

score 256 · Accepted Answer

TL;DR最善の解決策はs.replace('.','',1).isdigit()

さまざまなアプローチを比較するいくつかのベンチマークを行いました

def is_number_tryexcept(s):
    """ Returns True is string is a number. """
    try:
        float(s)
        return True
    except ValueError:
        return False

import re    
def is_number_regex(s):
    """ Returns True is string is a number. """
    if re.match("^\d+?\.\d+?$", s) is None:
        return s.isdigit()
    return True


def is_number_repl_isdigit(s):
    """ Returns True is string is a number. """
    return s.replace('.','',1).isdigit()

文字列が数値でない場合、except-block は非常に遅くなります。しかし、もっと重要なことは、try-except メソッドが科学表記法を正しく処理する唯一の方法であることです。

funcs = [
          is_number_tryexcept, 
          is_number_regex,
          is_number_repl_isdigit
          ]

a_float = '.1234'

print('Float notation ".1234" is not supported by:')
for f in funcs:
    if not f(a_float):
        print('\t -', f.__name__)

浮動小数点表記 ".1234" は次ではサポートされていません:
- is_number_regex

scientific1 = '1.000000e+50'
scientific2 = '1e50'


print('Scientific notation "1.000000e+50" is not supported by:')
for f in funcs:
    if not f(scientific1):
        print('\t -', f.__name__)




print('Scientific notation "1e50" is not supported by:')
for f in funcs:
    if not f(scientific2):
        print('\t -', f.__name__)

科学表記法「1.000000e+50」は以下ではサポートされていません:
- is_number_regex
- is_number_repl_isdigit
科学表記法「1e50」は以下ではサポートされていません:
- is_number_regex
- is_number_repl_isdigit

編集：ベンチマーク結果

import timeit

test_cases = ['1.12345', '1.12.345', 'abc12345', '12345']
times_n = {f.__name__:[] for f in funcs}

for t in test_cases:
    for f in funcs:
        f = f.__name__
        times_n[f].append(min(timeit.Timer('%s(t)' %f, 
                      'from __main__ import %s, t' %f)
                              .repeat(repeat=3, number=1000000)))

次の機能がテストされた場所

from re import match as re_match
from re import compile as re_compile

def is_number_tryexcept(s):
    """ Returns True is string is a number. """
    try:
        float(s)
        return True
    except ValueError:
        return False

def is_number_regex(s):
    """ Returns True is string is a number. """
    if re_match("^\d+?\.\d+?$", s) is None:
        return s.isdigit()
    return True


comp = re_compile("^\d+?\.\d+?$")    

def compiled_regex(s):
    """ Returns True is string is a number. """
    if comp.match(s) is None:
        return s.isdigit()
    return True


def is_number_repl_isdigit(s):
    """ Returns True is string is a number. """
    return s.replace('.','',1).isdigit()

score 78 · Accepted Answer

考慮に入れることができる1つの例外があります：文字列'NaN'

is_numberが「NaN」に対してFALSEを返すようにしたい場合、Pythonが数値ではない数値の表現に変換するため、このコードは機能しません（IDの問題について話します）。

>>> float('NaN')
nan

それ以外の場合は、現在広く使用しているコードに実際に感謝する必要があります。:)

G。

score 65 · Accepted Answer

これはどう：

'3.14'.replace('.','',1).isdigit()

'.' が 1 つまたはまったくない場合にのみ true を返します。数字の文字列で。

'3.14.5'.replace('.','',1).isdigit()

false を返します

編集：別のコメントを見たばかりです....replace(badstuff,'',maxnum_badstuff)他の場合に a を追加できます。任意の調味料ではなく塩を渡す場合 (参照: xkcd#974 )、これで問題ありません:P

score 47 · Accepted Answer

Alfe が指摘した後に更新すると、複雑な処理が両方とも行われるため、float を個別にチェックする必要はありません。

def is_number(s):
    try:
        complex(s) # for int, long, float and complex
    except ValueError:
        return False

    return True

前に言った: float で表すことができない複素数 (例: 1+2i) をチェックする必要があるかもしれないいくつかのまれなケースです:

def is_number(s):
    try:
        float(s) # for int, long and float
    except ValueError:
        try:
            complex(s) # for complex
        except ValueError:
            return False

    return True

score 43 · Accepted Answer

これは、醜くて遅いだけでなく、ぎこちなく見えます。

慣れるまでに時間がかかるかもしれませんが、これは Pythonic な方法です。すでに指摘したように、代替案はさらに悪いものです。しかし、この方法にはもう 1 つの利点があります。ポリモーフィズムです。

ダックタイピングの背後にある中心的な考え方は、「アヒルのように歩き、話すなら、それはアヒルだ」というものです。何かを float に変換できるかどうかを判断する方法を変更できるように、string をサブクラス化する必要があると判断した場合はどうなるでしょうか? または、他のオブジェクトを完全にテストすることにした場合はどうなりますか? 上記のコードを変更することなく、これらのことを行うことができます。

他の言語は、インターフェイスを使用してこれらの問題を解決します。どのソリューションが優れているかの分析は、別のスレッドに保存します。ただし、重要なのは、Python は明らかに方程式のダックタイピング側にあるということです。Python で多くのプログラミングを行う予定がある場合は、おそらくこのような構文に慣れる必要があります (ただし、それは意味しません)。もちろん気に入らなければなりません）。

考慮すべきもう 1 つの点: Python は、他の多くの言語と比較して、例外のスローとキャッチがかなり高速です (たとえば、.Net よりも 30 倍高速です)。言語自体は、例外ではない通常のプログラム条件を伝えるために例外をスローすることさえあります (for ループを使用するたびに)。したがって、重大な問題に気付くまでは、このコードのパフォーマンス面についてあまり心配する必要はありません。

score 31 · Accepted Answer

これを使用するにはint：

>>> "1221323".isdigit()
True

しかし、floatいくつかのトリックが必要です;-)。すべての浮動小数点数には 1 つのポイントがあります...

>>> "12.34".isdigit()
False
>>> "12.34".replace('.','',1).isdigit()
True
>>> "12.3.4".replace('.','',1).isdigit()
False

また、負の数の場合は、次を追加しlstrip()ます。

>>> '-12'.lstrip('-')
'12'

そして今、私たちは普遍的な方法を手に入れました：

>>> '-12.34'.lstrip('-').replace('.','',1).isdigit()
True
>>> '.-234'.lstrip('-').replace('.','',1).isdigit()
False

score 17 · Accepted Answer

数値以外の文字列の場合、try: except:実際には正規表現よりも遅くなります。有効な数値の文字列の場合、正規表現は遅くなります。したがって、適切な方法は入力によって異なります。

パフォーマンスが限界に達している場合は、 isfloatという関数を提供するfastnumbersという新しいサードパーティモジュールを使用できます。完全な開示、私は著者です。その結果を以下のタイミングに含めました。

from __future__ import print_function
import timeit

prep_base = '''\
x = 'invalid'
y = '5402'
z = '4.754e3'
'''

prep_try_method = '''\
def is_number_try(val):
    try:
        float(val)
        return True
    except ValueError:
        return False

'''

prep_re_method = '''\
import re
float_match = re.compile(r'[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?$').match
def is_number_re(val):
    return bool(float_match(val))

'''

fn_method = '''\
from fastnumbers import isfloat

'''

print('Try with non-number strings', timeit.timeit('is_number_try(x)',
    prep_base + prep_try_method), 'seconds')
print('Try with integer strings', timeit.timeit('is_number_try(y)',
    prep_base + prep_try_method), 'seconds')
print('Try with float strings', timeit.timeit('is_number_try(z)',
    prep_base + prep_try_method), 'seconds')
print()
print('Regex with non-number strings', timeit.timeit('is_number_re(x)',
    prep_base + prep_re_method), 'seconds')
print('Regex with integer strings', timeit.timeit('is_number_re(y)',
    prep_base + prep_re_method), 'seconds')
print('Regex with float strings', timeit.timeit('is_number_re(z)',
    prep_base + prep_re_method), 'seconds')
print()
print('fastnumbers with non-number strings', timeit.timeit('isfloat(x)',
    prep_base + 'from fastnumbers import isfloat'), 'seconds')
print('fastnumbers with integer strings', timeit.timeit('isfloat(y)',
    prep_base + 'from fastnumbers import isfloat'), 'seconds')
print('fastnumbers with float strings', timeit.timeit('isfloat(z)',
    prep_base + 'from fastnumbers import isfloat'), 'seconds')
print()

Try with non-number strings 2.39108395576 seconds
Try with integer strings 0.375686168671 seconds
Try with float strings 0.369210958481 seconds

Regex with non-number strings 0.748660802841 seconds
Regex with integer strings 1.02021503448 seconds
Regex with float strings 1.08564686775 seconds

fastnumbers with non-number strings 0.174362897873 seconds
fastnumbers with integer strings 0.179651021957 seconds
fastnumbers with float strings 0.20222902298 seconds

ご覧のように

try: except:数値入力の場合は高速でしたが、無効な入力の場合は非常に遅くなりました
入力が無効な場合、正規表現は非常に効率的です
fastnumbersどちらの場合も勝つ

score 14 · Accepted Answer

C# を模倣するだけ

C# には、スカラー値の解析を処理する 2 つの異なる関数があります。

Float.Parse()
Float.TryParse()

float.parse():

def parse(string):
    try:
        return float(string)
    except Exception:
        throw TypeError

注: 例外を TypeError に変更した理由が気になる方は、こちらのドキュメントを参照してください。

float.try_parse():

def try_parse(string, fail=None):
    try:
        return float(string)
    except Exception:
        return fail;

注: ブール値の「False」はまだ値型であるため、返したくありません。失敗を示しているため、どれも優れていません。もちろん、何か違うものが必要な場合は、fail パラメータを好きなように変更できます。

float を拡張して 'parse()' と 'try_parse()' を含めるには、'float' クラスにモンキーパッチを適用してこれらのメソッドを追加する必要があります。

既存の関数を尊重したい場合、コードは次のようになります。

def monkey_patch():
    if(!hasattr(float, 'parse')):
        float.parse = parse
    if(!hasattr(float, 'try_parse')):
        float.try_parse = try_parse

補足: 個人的にはモンキーパンチングと呼んでいます。これを行うと YMMV 以外の言語を乱用しているように感じるからです。

使用法：

float.parse('giggity') // throws TypeException
float.parse('54.3') // returns the scalar value 54.3
float.tryParse('twank') // returns None
float.tryParse('32.2') // returns the scalar value 32.2

そして、偉大な賢者パイソンは聖座シャルピススに、「あなたができることは何でも、私はあなたよりも上手にできます。私はあなたよりも上手にできます」と言いました。

score 14 · Accepted Answer

これが特に古いことは知っていますが、これを見つけた人にとって非常に価値のある、最も投票数の多い回答から欠落している情報をカバーしていると思われる回答を追加します。

入力を受け入れる必要がある場合は、次の各メソッドをカウントに接続します。(0 ～ 255 などではなく、整数の音声定義を使用していると仮定します。)

x.isdigit() x が整数かどうかをチェックするのに適しています。

x.replace('-','').isdigit() x が負かどうかを確認するのに適しています (チェック - 最初の位置)。

x.replace('.','').isdigit() x が 10 進数かどうかを調べるのに適しています。

x.replace(':','').isdigit() x が比率かどうかをチェックするのに適しています。

x.replace('/','',1).isdigit() x が分数かどうかのチェックに適しています。

score 12 · Accepted Answer

float() は特にそのためのものであるため、float にキャストして ValueError をキャッチするのがおそらく最速の方法です。文字列の解析 (正規表現など) を必要とするその他のものは、この操作用に調整されていないため、遅くなる可能性があります。私の0.02ドル。

score 9 · Accepted Answer

どの方法が最速かを見たかったのです。全体として、最良で最も一貫性のある結果がcheck_replace関数によって与えられました。最速の結果はcheck_exception関数によって与えられましたが、例外が発生しなかった場合に限ります。つまり、そのコードが最も効率的ですが、例外をスローするオーバーヘッドは非常に大きくなります。

キャストが成功したかどうかをチェックすることが正確な唯一の方法であることに注意してください。たとえば、これはで機能しcheck_exceptionますが、他の2つのテスト関数は有効なfloatに対してFalseを返します。

huge_number = float('1e+100')

ベンチマークコードは次のとおりです。

import time, re, random, string

ITERATIONS = 10000000

class Timer:    
    def __enter__(self):
        self.start = time.clock()
        return self
    def __exit__(self, *args):
        self.end = time.clock()
        self.interval = self.end - self.start

def check_regexp(x):
    return re.compile("^\d*\.?\d*$").match(x) is not None

def check_replace(x):
    return x.replace('.','',1).isdigit()

def check_exception(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

to_check = [check_regexp, check_replace, check_exception]

print('preparing data...')
good_numbers = [
    str(random.random() / random.random()) 
    for x in range(ITERATIONS)]

bad_numbers = ['.' + x for x in good_numbers]

strings = [
    ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(random.randint(1,10)))
    for x in range(ITERATIONS)]

print('running test...')
for func in to_check:
    with Timer() as t:
        for x in good_numbers:
            res = func(x)
    print('%s with good floats: %s' % (func.__name__, t.interval))
    with Timer() as t:
        for x in bad_numbers:
            res = func(x)
    print('%s with bad floats: %s' % (func.__name__, t.interval))
    with Timer() as t:
        for x in strings:
            res = func(x)
    print('%s with strings: %s' % (func.__name__, t.interval))

2017 MacBookPro13でのPython2.7.10の結果は次のとおりです。

check_regexp with good floats: 12.688639
check_regexp with bad floats: 11.624862
check_regexp with strings: 11.349414
check_replace with good floats: 4.419841
check_replace with bad floats: 4.294909
check_replace with strings: 4.086358
check_exception with good floats: 3.276668
check_exception with bad floats: 13.843092
check_exception with strings: 15.786169

2017 MacBookPro13でのPython3.6.5の結果は次のとおりです。

check_regexp with good floats: 13.472906000000009
check_regexp with bad floats: 12.977665000000016
check_regexp with strings: 12.417542999999995
check_replace with good floats: 6.011045999999993
check_replace with bad floats: 4.849356
check_replace with strings: 4.282754000000011
check_exception with good floats: 6.039081999999979
check_exception with bad floats: 9.322753000000006
check_exception with strings: 9.952595000000002

2017 MacBookPro13でのPyPy2.7.13の結果は次のとおりです。

check_regexp with good floats: 2.693217
check_regexp with bad floats: 2.744819
check_regexp with strings: 2.532414
check_replace with good floats: 0.604367
check_replace with bad floats: 0.538169
check_replace with strings: 0.598664
check_exception with good floats: 1.944103
check_exception with bad floats: 2.449182
check_exception with strings: 2.200056

score 9 · Accepted Answer

すべてをまとめると、Nan、無限大、複素数 (これらは i ではなく j、つまり 1+2j で指定されているようです) をチェックすると、次のようになります。

def is_number(s):
    try:
        n=str(float(s))
        if n == "nan" or n=="inf" or n=="-inf" : return False
    except ValueError:
        try:
            complex(s) # for complex
        except ValueError:
            return False
    return True

score 5 · Accepted Answer

文字列が基本型 (float、int、str、bool) にキャストされているかどうかを判断する必要がありました。インターネットで何も見つからなかった後、私はこれを作成しました：

def str_to_type (s):
    """ Get possible cast type for a string

    Parameters
    ----------
    s : string

    Returns
    -------
    float,int,str,bool : type
        Depending on what it can be cast to

    """    
    try:                
        f = float(s)        
        if "." not in s:
            return int
        return float
    except ValueError:
        value = s.upper()
        if value == "TRUE" or value == "FALSE":
            return bool
        return type(s)

例

str_to_type("true") # bool
str_to_type("6.0") # float
str_to_type("6") # int
str_to_type("6abc") # str
str_to_type(u"6abc") # unicode

タイプをキャプチャして使用できます

s = "6.0"
type_ = str_to_type(s) # float
f = type_(s)

score 4 · Accepted Answer

速度テストを行いました。文字列が数値である可能性が高い場合、 try/except戦略が最速であるとしましょう.文字列が数値である可能性が低く、整数チェックに興味がある場合は、いくつかのテストを行う価値があります'-')。浮動小数点数を確認したい場合は、エスケープなしのtry/exceptコードを使用する必要があります。

score 2 · Accepted Answer

私は、このスレッドにたどり着いた問題、つまり、最も直感的な方法でデータのコレクションを文字列と数値に変換する方法に取り組んでいました。元のコードを読んだ後、必要なものが 2 つの点で異なることに気付きました。

1 - 文字列が整数を表す場合、整数の結果が必要でした

2 - 数値または文字列の結果をデータ構造に固定したかった

したがって、元のコードを適応させて、この派生物を生成しました。

def string_or_number(s):
    try:
        z = int(s)
        return z
    except ValueError:
        try:
            z = float(s)
            return z
        except ValueError:
            return s

score 2 · Accepted Answer

import re
def is_number(num):
    pattern = re.compile(r'^[-+]?[-0-9]\d*\.\d*|[-+]?\.?[0-9]\d*$')
    result = pattern.match(num)
    if result:
        return True
    else:
        return False


&gt;>>: is_number('1')
True

>>>: is_number('111')
True

>>>: is_number('11.1')
True

>>>: is_number('-11.1')
True

>>>: is_number('inf')
False

>>>: is_number('-inf')
False

score 1 · Accepted Answer

私もあなたが言及した関数を使用しましたが、すぐに「Nan」、「Inf」などの文字列とそのバリエーションが数値と見なされることに気付きました。したがって、これらのタイプの入力で false を返し、「1e3」バリアントに失敗しない、関数の改良版を提案します。

def is_float(text):
    try:
        float(text)
        # check for nan/infinity etc.
        if text.isalpha():
            return False
        return True
    except ValueError:
        return False

score 1 · Accepted Answer

これが私の簡単な方法です。いくつかの文字列をループしていて、それらが数値であることが判明した場合に配列に追加したいとしましょう。

try:
    myvar.append( float(string_to_check) )
except:
    continue

myvar.apppend を、文字列が数値であることが判明した場合に、その文字列に対して実行したい任意の操作に置き換えます。float() 操作を使用して、返されたエラーを使用して、文字列が数値かどうかを判断するという考え方です。

score 0 · Accepted Answer

True や False よりも有用な値を返すことにより、例外手法を有用な方法で一般化できます。たとえば、この関数は文字列を引用符で囲みますが、数字はそのままにします。これは、R の変数定義を作成するための簡単で汚いフィルターに必要なものです。

import sys

def fix_quotes(s):
    try:
        float(s)
        return s
    except ValueError:
        return '"{0}"'.format(s)

for line in sys.stdin:
    input = line.split()
    print input[0], '<- c(', ','.join(fix_quotes(c) for c in input[1:]), ')'

score 0 · Accepted Answer

これを試して。

 def is_number(var):
    try:
       if var == int(var):
            return True
    except Exception:
        return False

python - 文字列が数値 (float) かどうかを確認するにはどうすればよいですか?

38 に答える 38

編集：ベンチマーク結果

C# を模倣するだけ

Related

Reference