python - ベンフォードの法律プログラム

Question

2 つのデータリストについてベンフォードの法則を証明するプログラムを作成する必要があります。ほとんどの部分でコードがダウンしていると思いますが、見落としている小さなエラーがあると思います。これがサイトの使用方法ではない場合は申し訳ありませんが、本当に助けが必要です. これが私のコードです。

def getData(fileName):

    data = []
    f = open(fileName,'r')
    for line in f:
        data.append(line)
    f.close()

    return data

def getLeadDigitCounts(data):

    counts = [0,0,0,0,0,0,0,0,0]

    for i in data:
        pop = i[1]
        digits = pop[0]
        int(digits)
        counts[digits-1] += 1

    return counts

def showResults(counts):

    percentage = 0
    Sum = 0
    num = 0
    Total = 0

    for i in counts:
        Total += i

    print"number of data points:",Sum
    print
    print"digit number percentage"
    for i in counts:
        Sum += i
        percentage = counts[i]/float(Sum)
        num = counts[i]
        print"5%d 6%d %f"%(i,num,percentage)


def showLeadingDigits(digit,data):

    print"Showing data with a leading",digit
    for i in data:
        if digit == i[i][1]:
            print i

def processFile(name):

    data = getData(name)
    counts = getLeadDigitCounts(data)
    showResults(counts)

    digit = input('Enter leading digit: ')
    showLeadingDigits(digit, data)

def main():

    processFile('TexasCountyPop2010.txt')
    processFile('MilesofTexasRoad.txt')

main()

これが私がこのサイトを使用することになっている方法ではない場合は、もう一度申し訳ありません. また、教授が教えてくださったプログラミング手法しか使えないので、このままコードをきれいにするためのアドバイスをいただければ幸いです。

また、ここに私のデータから数行を示します。

Anderson County     58458
Andrews County  14786
Angelina County     86771
Aransas County  23158
Archer County   9054
Armstrong County    1901

score 1 · Accepted Answer

あなたのエラーはこの行から来ています：

int(digits)

これは実際には何もしませんdigits。digits整数に変換する場合は、変数を再設定する必要があります。

digits = int(digits)

また、データを適切に解析するには、次のようにします。

for line in data:
    place, digits = line.rsplit(None, 1)
    digits = int(digits)
    counts[digits - 1] += 1

score 0 · Accepted Answer

ここで別の（そしておそらくもっと段階的な）コードを共有するだけです。ルビーです。

The thing is, Benford's Law doesn't apply when you have a specific range of random data to extract from. The maximum number of the data set that you are extracting random information from must be undetermined, or infinite.

In other words, say, you used a computer number generator that had a 'set' or specific range from which to extract the numbers, eg. 1-100. You would undoubtedly end up with a random dataset of numbers, yes, but the number 1 would appear as a first digit as often as the number 9 or any other number.

**The interesting** part, actually, happens when you let a computer (or nature) decide randomly, and on each instance, how large you want the random number to potentially be. Then you get a nice, bi-dimensional random dataset, that perfectly attains to Benford's Law. I have generated this RUBY code for you, which will neatly prove that, to our fascination as Mathematicians, Benford's Law works each and every single time!

Take a look at this bit of code I've put together for you!
It's a bit WET, but I'm sure it'll explain.

<-- 以下のルビーコード -->

dataset = []

999.times do
  random = rand(999)
  dataset << rand(random)
end

startwith1 = []
startwith2 = []
startwith3 = []
startwith4 = []
startwith5 = []
startwith6 = []
startwith7 = []
startwith8 = []
startwith9 = []

dataset.each do |element|
  case element.to_s.split('')[0].to_i
  when 1 then startwith1 << element
  when 2 then startwith2 << element
  when 3 then startwith3 << element
  when 4 then startwith4 << element
  when 5 then startwith5 << element
  when 6 then startwith6 << element
  when 7 then startwith7 << element
  when 8 then startwith8 << element
  when 9 then startwith9 << element
  end
end

a = startwith1.length
b = startwith2.length
c = startwith3.length
d = startwith4.length
e = startwith5.length
f = startwith6.length
g = startwith7.length
h = startwith8.length
i = startwith9.length

sum = a + b + c + d + e + f + g + h + i

p "#{a} times first digit = 1; equating #{(a * 100) / sum}%"
p "#{b} times first digit = 2; equating #{(b * 100) / sum}%"
p "#{c} times first digit = 3; equating #{(c * 100) / sum}%"
p "#{d} times first digit = 4; equating #{(d * 100) / sum}%"
p "#{e} times first digit = 5; equating #{(e * 100) / sum}%"
p "#{f} times first digit = 6; equating #{(f * 100) / sum}%"
p "#{g} times first digit = 7; equating #{(g * 100) / sum}%"
p "#{h} times first digit = 8; equating #{(h * 100) / sum}%"
p "#{i} times first digit = 9; equating #{(i * 100) / sum}%"

score 0 · Accepted Answer

コードの 1 サイクルを見てみましょう。問題が何であるかがわかると思います。このファイルをデータとして使用します

An, 10, 22
In, 33, 44
Out, 3, 99

戻りgetData値:

["An, 10, 22",
"In, 33, 44",
"Out, 3, 99"]

ループの最初のパスを見てみましょう。

for i in data:
    # i = "An, 10, 22"
    pop = i[1]
    # pop = 'n', the second character of i
    digits = pop[0]
    # digits = 'n', the first character of pop
    int(digits)
    # Error here, but you probably wanted digits = int(digits)
    counts[digits-1] += 1

データがどのように構造化されているかに応じて、ファイルから取得すると予想される数字を抽出するロジックを理解する必要があります。このロジックは、getData 関数でうまく機能する可能性がありますが、ほとんどの場合、データの詳細に依存します。

python - ベンフォードの法律プログラム

3 に答える 3

Related

Reference