vb.net - search within an array with a condition

Question

I have two array I'm trying to compare at many levels. Both have the same structure with 3 "columns.

The first column contains the polygon's ID, the second a area type, and the third, the percentage of each area type for a polygone.

So, for many rows, it will compare, for example, ID : 1 Type : aaa % : 100 But for some elements, I have many rows for the same ID. For example, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 25% --- ID 2, type ccc, 50%. And in the second array, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 10% --- ID 2, type eee, 38% --- ID 2, type fff, 27%.

here's a visual example.. プリントスクリーン

So, my function has to compare these two array and send me an email if there are differences. (I wont show you the real code because there are 811 lines). The first "if" condition is

if array1.id = array2.id Then
if array1.type = array2.type Then
if array1.percent = array2.percent Then
zone_verification = True
Else
zone_verification = False

The probleme is because there are more than 50 000 rows in each array. So when I run the function, for each "array1.id", the function search through 50 000 rows in array2. 50 000 searchs for 50 000 rows.. it's pretty long to run!

I'm looking for something to get it running faster. How could I get my search more specific. Example : I have many id "2" in the array1. If there are many id "2" in the array2, find it, and push all the array2.id = 3 in a "sub array" or something like that, and search in these specific rows. So I'll have just X rows in array1 to compare with X rows in array 2, not with 50 000. and when each "id 2" in array1 is done, do the same thing for "id 4".. and for "id 5"...

Hope it's clear. it's almost the first time I use VB.net, and I have this big function to get running.

Thanks

EDIT

Here's what I wanna do.

I have two different layers in a geospatial database. Both layers have the same structure. They are a "spatial join" of the land parcels (55 000), and the land use layer. The first layer is the current one, and the second layer is the next one we'll use after 2015.

So I have, for each "land parcel" the percentage of each land use. So, for a "land parcel" (ID 7580-80-2532, I can have 50% of farming use (TYPE FAR-23), and 50% of residantial use (RES-112). In the first array, I'll have 2 rows with the same ID (7580-80-2532), but each one will have a different type (FAR-23, RES-112) and a different %. In the second layer, the same the municipal zoning (land use) has changed. So the same "land parcel" will now be 40% of residential use (RES-112), 20% of commercial (COM-54) and 40% of a new farming use (FAR-33). So, I wanna know if there are some differences. Some land parcels will be exactly the same. Some parcels will keep the same land use, but not the same percentage of each. But for some land parcel, there will be more or less land use types with different percentage of each.

このスクリプトでこれら 2 つのレイヤーを比較し、同じ土地区画 ID に対してこれら 2 つのレイヤーに違いがある場合にメールを送信するようにします。

スクリプトは既に機能していますが、時間がかかりすぎます。

問題は、スクリプトが配列 1 の各行に対してすべての array2 を通過することだと思います。

私が望むのは、array1 に同じ ID を持つ複数の行がある場合、両方の配列でこの ID のみを取得することです。

IDで並べれば条件書けるかも。「探しているものが見つかったら、別の値が見つかったら検索をやめますか?

先週から VB を使っているので、はっきりと説明するのは難しいです.そして、英語は私の母国語ではありません! ;)

score 0 · Accepted Answer

私はあなたが書いた最初の部分に基づいてこの質問に答えています（つまり、編集セクションはありません）。正解は優れたアルゴリズムを説明するはずですが、DB機能はこれらの目的のために多くのクエリを最適化しているため、DB機能を使用することをお勧めします。

すべてのレコードをDBの2つのテーブルに配置します-O（n）時間...レコードが静的である場合、毎回この手順を実行する必要はありません。表1IDタイプのパーセント

表2idタイプのパーセント

次に、DBクエリを使用します。このようなselect count（*）from table1 t1、table2 t2 where t1.id！= t2.id and t1.type！= t2.type（いくつかのより良いクエリを使用できます。言いたいのは、この操作を実行するための制御をDBに与えることです）コードで結果を取得し、必要な操作を実行します。

編集 1）ID +タイプ+パーセントに基づいてO（n logn）時間でそれらをソートしてから、バイナリ検索を実行できます。2）最初のレコードを適切なキーでハッシュマップに保存します-IDのみまたはID +タイプの場合、これにはO（n）時間がかかり、キーが正しい場合、検索には一定の時間がかかります。

score 0 · Accepted Answer

このデータを格納するための構造を定義する必要があります。すべてのデータをLandParcelクラスに保存します。HashSet<ParcelData>

public class ParcelData
{
    public ParcelType Type { get; set; } // This can be an enum, string, etc.
    public int Percent { get; set; }

    // Redefine Equals and GetHashCode conveniently
}

public class LandParcel
{
    public ID Id { get; set; } // Whatever the type of the ID is...
    public HashSet<ParcelData> Data { get; set; }
}

次に、次のようなデータ構造を構築する必要があります。

Dictionary<ID, LandParcel> data1 = new ....

foreach (var item in array1)
{
    LandParcel p;

    if (!data1.TryGetValue(item.id, out p)
        data1[item.id] = p = new LandParcel(id);

    // Can this data be repeated?
    p.Data.Add(new ParcelData(item.type, item.percent)); 
}

data22 番目の配列の辞書でも同じことを行います。のすべてのアイテムを繰り返し処理data1し、の同じ ID を持つアイテムと比較しますdata2。

foreach (var parcel2 in data2.Values)
{
     var parcel1 = data1[parcel2.ID]; // Beware with exceptions here !!!

     if (!parcel1.Data.SetEquals(parcel2.Data))
         // You have different parcels
}

（今見てみると、ここでは実際に小さなデータベースクエリを実行しています。一種の臭いコードです...）

私は VB にあまり慣れていないので、C# コードで申し訳ありませんが、かなり簡単なはずです。

score 0 · Accepted Answer

最初の配列と 2 番目の配列に違いがあるかどうかだけを知りたい場合は、次のようにします。

Dim diff = New HashSet(of Polygon)(array1)
diff.SymmetricExceptWith(array2)

diff には、array1 または array2 に固有の Polygon が含まれます。他の種類の比較を行いたい場合は、何をしようとしているのかを正確に説明する必要があります。

アップデート：

次のようにグループ化とルックアップを使用できます。

'Create lookup with first array, for fast access by ID
Dim lookupByID = array1.ToLookup(Function(p) p.id)

'Loop through each group of items with same ID in array2
For Each secondArrayValues in array2.GroupBy(Function(p) p.id)

    Dim currentID As Integer = secondArrayValues.Key 'Current ID is the grouping key

    'Retrieve values with same ID in array1
    'Use a hashset to easily compare for equality
    Dim firstArrayValues As New HashSet(of Polygon)(lookupByID(currentID))

    'Check for differences between the two sets of data, for this ID
    If Not firstArrayValues.SetEquals(secondArrayValues) Then

        'Data has changed, do something
        Console.WriteLine("Differences for ID " & currentID)
    End If
Next

vb.net - search within an array with a condition

3 に答える 3

Related

Reference