postgresql - 加重ランダム選択

Question

お願いします。最も一般的な姓名を持つ 2 つのテーブルがあります。各テーブルには基本的に 2 つのフィールドがあります。

テーブル

CREATE TABLE "common_first_name" (
    "first_name" text PRIMARY KEY, --The text representing the name
    "ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.     
    "inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
    "updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);

CREATE TABLE "common_last_name" (
    "last_name" text PRIMARY KEY, --The text representing the name
    "ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.     
    "inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
    "updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);

PS: TOP 1 の名前は、約 1.8% の確率でしか発生しません。テーブルにはそれぞれ 1000 行あります。

関数 (疑似、READY ではない)

CREATE OR REPLACE FUNCTION create_sample_data(p_number_of_records INT)
    RETURNS VOID
    AS $$
DECLARE
    SUM_OF_WEIGHTS CONSTANT INT := 100;
BEGIN

    FOR i IN 1..coalesce(p_number_of_records, 0) LOOP
      --Get the random first and last name but taking in consideration their probability (RATIO)round(random()*SUM_OF_WEIGHTS); 
      --create_person (random_first_name || ' ' || random_last_name);
    END LOOP;
END
$$
LANGUAGE plpgsql VOLATILE;

PS: 各名前 (テーブルごと) のすべての比率の合計は 100% になります。

関数を N 回実行し、名前と姓を取得してサンプルデータを作成したい...両方のテーブルにそれぞれ 1000 行あります。

サンプルサイズは、1000 名から 1000000 名までの範囲で指定できるため、このランダムな重み付け関数を実行する「高速」な方法があれば、さらに優れています。

PL/PGSQLでそれを行う方法の提案はありますか?

SUPABASE.IO で PG 13.3 を使用しています。

ありがとう

postgresql - 加重ランダム選択

1 に答える 1

Related

Reference