performance - Tclでリストに一意の値を入力するより速い方法は何ですか?

Question

一意の値のリストを作成したい。値はさまざまなソースから取得されます。最終的なリストに入力するには 2 つの方法があります。

すべての値を入力してから実行しますlrmdups。

set finalList [list ]
foreach selcetion  $selectionList {
    regexp {(\d+):(\d+)} $selection -> start end
    for {set i $start} {$i <= $end} {incr i} {
        lappend finalList $i
    }
}
set finalList [lrmdups $finalList]

または、値がリストに存在するかどうかを確認し、それが追加されていない場合にのみ:

set finalList [list ]
foreach selcetion  $selectionList {
    regexp {(\d+):(\d+)} $selection -> start end
    for {set i $start} {$i <= $end} {incr i} {
        if {[lsearch $finalList $i] == -1} {
            lappend finalList $i
        }
    }
}

2 つの方法のうち、どちらが高速ですか?

score 5 · Accepted Answer

timeコマンドを使用して、Tcl コードのパフォーマンスをテストします。コードをプロシージャ内に配置して、バイトコンパイルのメリットを得てから、time コマンドを使用してテストを何度も実行し、反復ごとの平均時間を取得してください。たとえば、次の例は、expr 式を常に中括弧で囲む必要がある理由を示しています。

% proc a {} {expr 1 + 2 + 3}
% proc b {} {expr {1 + 2 + 3}}
% time a 1000
4.491 microseconds per iteration
% time b 1000
0.563 microseconds per iteration

特定のタスクに対処するには、新しい値をそれぞれ配列に追加し、重複を食べさせてから、最後にリストに変換します。

proc getUniques {wantedSize} {
    array set uniques {}
    while {[array size uniques] != $wantedSize} {
         set uniques([getNewValue]) {}
    }
    return [array names uniques]
}

score 4 · Accepted Answer

コマンドを使用しtimeてベンチマークも行います。これが私のコードです.2つのメソッドを追加しました.1つは使用arrayし、もう1つはstruct::set重複を排除するために使用します.

#!/usr/bin/env tclsh
#http://stackoverflow.com/questions/18337534/what-way-is-faster-to-populate-a-list-with-unique-values-in-tcl

package require Tclx
package require struct::set

proc removeDupMethod {selectionList} {
    set finalList [list ]
    foreach selection $selectionList {
        regexp {(\d+):(\d+)} $selection -> start end
        for {set i $start} {$i <= $end} {incr i} {
            lappend finalList $i
        }
    }
    set finalList [lrmdups $finalList]
    return $finalList
}

proc searchBeforInsertMethod {selectionList} {
    set finalList [list ]
    foreach selection $selectionList {
        regexp {(\d+):(\d+)} $selection -> start end
        for {set i $start} {$i <= $end} {incr i} {
            if {[lsearch $finalList $i] == -1} {
                lappend finalList $i
            }
        }
    }
}

proc useArrayMethod {selectionList} {
    array set tally {}
    foreach selection $selectionList {
        regexp {(\d+):(\d+)} $selection -> start end
        for {set i $start} {$i <= $end} {incr i} {
            incr tally($i)
        }
    }
    set finalList [array names tally]
    return $finalList
}

proc useStructSetMethod {selectionList} {
    set finalList {}
    foreach selection $selectionList {
        regexp {(\d+):(\d+)} $selection -> start end
        for {set i $start} {$i <= $end} {incr i} {
            struct::set include finalList $i
        }
    }
    return $finalList
}

# Performs the benchmark on a method
proc bench {methodName} {
    set selectionList {1:10 5:20 10:30 4:30}
    set timeInfo [time {$methodName $selectionList} 1000]
    puts "$methodName - $timeInfo"
}

# main
bench removeDupMethod
bench searchBeforInsertMethod
bench useArrayMethod
bench useStructSetMethod

結果：

removeDupMethod - 281.961364 microseconds per iteration
searchBeforInsertMethod - 93.984991 microseconds per iteration
useArrayMethod - 122.354889 microseconds per iteration
useStructSetMethod - 576.293311 microseconds per iteration

討論

2 番目の方法searchBeforInsertMethodが最速です。
useArrayMethod一意性を確保するために配列を使用するが 2 番目になります。これは、TCL の組み込みリストコマンドが非常に最適化されていることを意味します。
驚いたことに、useStructSetMethodが最も遅いです。ライブラリコマンドを最適化する必要があると思っていましたが、間違っていました。

アップデート

私はSiybのヒントを取り、置き換えました：

regexp {(\d+):(\d+)} $selection -> start end

と：

set range [split $selection :]
set start [lindex $selection 0]
set end [lindex $selection 1]

そして、速度が劇的に向上することを確認してください。

removeDupMethod - 9.337442 microseconds per iteration
searchBeforInsertMethod - 5.528975999999999 microseconds per iteration
useArrayMethod - 6.8120519999999996 microseconds per iteration
useStructSetMethod - 5.774831 microseconds per iteration
useNative - 6.105141 microseconds per iteration

ノート

最速はそのままsearchBeforInsertMethodで、スピードアップは17倍近く。
useStructSetMethod現在2位に浮上

更新 2

potrzebie の要求に従って、先頭に 5000:6000 を追加しましたが、数値はあまり変わりません。

removeDupMethod - 10.826106 microseconds per iteration
searchBeforInsertMethod - 6.296769 microseconds per iteration
useArrayMethod - 7.752042 microseconds per iteration
useStructSetMethod - 6.910305999999999 microseconds per iteration
useNative - 7.274724 microseconds per iteration

score 2 · Accepted Answer

lrmdups の代わりに lsort -unique $list を使用してみました。私のボックスでは、これが最速の方法です。

proc useNative {selectionList} {
        foreach selection $selectionList {
            regexp {(\d+):(\d+)} $selection -> start end
            for {set i $start} {$i <= $end} {incr i} {
                lappend finalList $i
            }
        }
        set finalList [lsort -unique $finalList]
        return $finalList
}

removeDupMethod - 171.573 microseconds per iteration
searchBeforInsertMethod - 58.264 microseconds per iteration
useArrayMethod - 96.109 microseconds per iteration
useStructSetMethod - 386.889 microseconds per iteration
useNative - 41.556 microseconds per iteration

編集: 正規表現の代わりに split を使用すると、速度も向上します (正規表現が実際に問題の一部である場合):

useNative - 20.938 microseconds per iteration

EDIT2: -integer を lsort パラメータとして追加してみてください。整数のソートを計画している場合は、速度も少し向上するはずです。

performance - Tclでリストに一意の値を入力するより速い方法は何ですか?

3 に答える 3

討論

アップデート

更新 2

Related

Reference