python - Wilsonスコア間隔のPython実装？

Question

「平均評価で並べ替えない方法」を読んだ後、ベルヌーイパラメーターのウィルソンスコア信頼区間の下限のPython実装を持っている人がいるかどうか知りたいと思いました。

score 35 · Accepted Answer

Reddit は、コメントのランキングにウィルソンスコア間隔を使用します。説明と Python の実装については、こちらを参照してください。

#Rewritten code from /r2/r2/lib/db/_sorts.pyx

from math import sqrt

def confidence(ups, downs):
    n = ups + downs

    if n == 0:
        return 0

    z = 1.0 #1.44 = 85%, 1.96 = 95%
    phat = float(ups) / n
    return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))

score 19 · Accepted Answer

これはウィルソン呼び出しが間違っていると思います。なぜなら、1 up 0 down の場合、負の値に対してa を実行できないためNaNになるからです。sqrt

記事How not to sort by average pageの ruby の例を見ると、正しいものを見つけることができます。

return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))

score 4 · Accepted Answer

信頼限界から直接 z を実際に計算し、numpy/scipy のインストールを避けたい場合は、次のコードスニペットを使用できます。

import math

def binconf(p, n, c=0.95):
  '''
  Calculate binomial confidence interval based on the number of positive and
  negative events observed.  Uses Wilson score and approximations to inverse
  of normal cumulative density function.

  Parameters
  ----------
  p: int
      number of positive events observed
  n: int
      number of negative events observed
  c : optional, [0,1]
      confidence percentage. e.g. 0.95 means 95% confident the probability of
      success lies between the 2 returned values

  Returns
  -------
  theta_low  : float
      lower bound on confidence interval
  theta_high : float
      upper bound on confidence interval
  '''
  p, n = float(p), float(n)
  N    = p + n

  if N == 0.0: return (0.0, 1.0)

  p = p / N
  z = normcdfi(1 - 0.5 * (1-c))

  a1 = 1.0 / (1.0 + z * z / N)
  a2 = p + z * z / (2 * N)
  a3 = z * math.sqrt(p * (1-p) / N + z * z / (4 * N * N))

  return (a1 * (a2 - a3), a1 * (a2 + a3))


def erfi(x):
  """Approximation to inverse error function"""
  a  = 0.147  # MAGIC!!!
  a1 = math.log(1 - x * x)
  a2 = (
    2.0 / (math.pi * a)
    + a1 / 2.0
  )

  return (
    sign(x) *
    math.sqrt( math.sqrt(a2 * a2 - a1 / a) - a2 )
  )


def sign(x):
  if x  < 0: return -1
  if x == 0: return  0
  if x  > 0: return  1


def normcdfi(p, mu=0.0, sigma2=1.0):
  """Inverse CDF of normal distribution"""
  if mu == 0.0 and sigma2 == 1.0:
    return math.sqrt(2) * erfi(2 * p - 1)
  else:
    return mu + math.sqrt(sigma2) * normcdfi(p)

python - Wilsonスコア間隔のPython実装？

5 に答える 5

Related

Reference