python - 漸化式による純粋な Python プライムシーブの改善

Question

サブリストの長さの複雑な式を取り出して、素数スレッドのチャンピオンソリューションをさらに最適化しようとしています。同じサブシーケンスの len() は遅すぎます。これは、len が高価で、サブシーケンスの生成にコストがかかるためです。これは関数を少し高速化するように見えますが、条件ステートメント内でのみ除算を行っているにもかかわらず、まだ除算を取り除くことができませんでした。もちろん、n*n の代わりに n のマーキングを開始する最適化を取り除くことで、長さの計算を単純化することもできます...

除算 / を整数除算 // に置き換えて、Python 3 または

from __future__ import division

また、この再帰式が numpy ソリューションの高速化に役立つ場合は興味深いですが、numpy をあまり使用した経験がありません。

コードに対して psyco を有効にすると、話はまったく異なりますが、アトキンスふるいコードは、この特別なスライス手法よりも高速になります。

import cProfile

def rwh_primes1(n):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    """ Returns  a list of primes < n """
    sieve = [True] * (n//2)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = [False] * ((n-i*i-1)//(2*i)+1)
    return [2] + [2*i+1 for i in xrange(1,n/2) if sieve[i]]

def primes(n):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    # recurrence formula for length by amount1 and amount2 Tony Veijalainen 2010
    """ Returns  a list of primes < n """
    sieve = [True] * (n//2)
    amount1 = n-10
    amount2 = 6

    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i//2]:
             ## can you make recurrence formula for whole reciprocal?
            sieve[i*i//2::i] = [False] * (amount1//amount2+1)
        amount1-=4*i+4
        amount2+=4

    return [2] + [2*i+1 for i in xrange(1,n//2) if sieve[i]]

numprimes=1000000
print('Profiling')
cProfile.Profile.bias = 4e-6
for test in (rwh_primes1, primes):
    cProfile.run("test(numprimes)")

プロファイリング (バージョン間の違いはあまりありません)

         3 function calls in 0.191 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.191    0.191 <string>:1(<module>)
        1    0.185    0.185    0.185    0.185 myprimes.py:3(rwh_primes1)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


         3 function calls in 0.192 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.192    0.192 <string>:1(<module>)
        1    0.186    0.186    0.186    0.186 myprimes.py:12(primes)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

興味深いことに、制限を 10**8 に増やし、プロファイリングを削除する関数にタイミングデコレータを配置します。

rwh_primes1 took 23.670 s
primes took 22.792 s
primesieve took 10.850 s

興味深いことに、素数のリストを生成せずにふるい自体を返す場合、時間は数リストバージョンの約半分になります。

score 1 · Accepted Answer

ホイールの最適化を行うことができます。2 と 3 の倍数は素数ではないため、格納しないでください。次に、5 から開始し、2、4、2、4、2、4 などのステップでインクリメントすることにより、2 と 3 の倍数をスキップできます。

以下はそのための C++ コードです。お役に立てれば。

void sieve23()
{
    int lim=sqrt(MAX);
    for(int i=5,bit1=0;i<=lim;i+=(bit1?4:2),bit1^=1)
    {
        if(!isComp[i/3])
        {
            for(int j=i,bit2=1;;)
            {
                j+=(bit2?4*i:2*i);
                bit2=!bit2;
                if(j>=MAX)break;
                isComp[j/3]=1;
            }
        }
    }
}

score 0 · Accepted Answer

速度を向上させるために C++ に移行することにした場合は、Python sieve を C++ に移植しました。完全な議論はここにあります:最適化されたエラトステネスのふるいを Python から C++ に移植します。

Intel Q6600、Ubuntu 10.10、g++ -O3および N=100000000 でコンパイルされた場合、これには 415 ミリ秒かかります。

#include <vector>
#include <boost/dynamic_bitset.hpp>

// http://vault.embedded.com/98/9802fe2.htm - integer square root
unsigned short isqrt(unsigned long a) {
    unsigned long rem = 0;
    unsigned long root = 0;

    for (short i = 0; i < 16; i++) {
        root <<= 1;
        rem = ((rem << 2) + (a >> 30));
        a <<= 2;
        root++;

        if (root <= rem) {
            rem -= root;
            root++;
        } else root--;

    }

    return static_cast<unsigned short> (root >> 1);
}

// https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
// https://stackoverflow.com/questions/5293238/porting-optimized-sieve-of-eratosthenes-from-python-to-c/5293492
template <class T>
void primesbelow(T N, std::vector<T> &primes) {
    T i, j, k, sievemax, sievemaxroot;

    sievemax = N/3;
    if ((N % 6) == 2) sievemax++;

    sievemaxroot = isqrt(N)/3;

    boost::dynamic_bitset<> sieve(sievemax);
    sieve.set();
    sieve[0] = 0;

    for (i = 0; i <= sievemaxroot; i++) {
        if (sieve[i]) {
            k = (3*i + 1) | 1;
            for (j = k*k/3; j < sievemax; j += 2*k) sieve[j] = 0;
            for (j = (k*k+4*k-2*k*(i&1))/3; j < sievemax; j += 2*k) sieve[j] = 0;
        }
    }

    primes.push_back(2);
    primes.push_back(3);

    for (i = 0; i < sievemax; i++) {
        if (sieve[i]) primes.push_back((3*i+1)|1);
    }

}

python - 漸化式による純粋な Python プライム シーブの改善

2 に答える 2

Related

Reference

python - 漸化式による純粋な Python プライムシーブの改善