sql - PostgreSQL のテキストからの n-gram

Question

PostgreSQL のテキスト列から n-gram を作成しようとしています。現在、テキスト列のデータ（文）を配列に（空白で）分割しています。

enter code heretableName から regexp_split_to_array(sentenceData,E'\s+') を選択

この配列を取得したら、どうすればよいですか。

n-gram を検索するループを作成し、それぞれを別のテーブルの行に書き込む

unnest を使用すると、すべての配列のすべての要素を別々の行で取得できます。おそらく、単一の列から n-gram を取得する方法を考えることができますが、文の境界を失うので、保持する必要があります。

上記のシナリオをエミュレートする PostgreSQL のサンプル SQL コード

create table tableName(sentenceData  text);

INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');

INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');

INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');

select regexp_split_to_array(sentenceData,E'\\s+')   from tableName;

select unnest(regexp_split_to_array(sentenceData,E'\\s+')) from tableName;

score 3 · Accepted Answer

pg_trgmを確認してください: 「pg_trgm モジュールは、トライグラムマッチングに基づいてテキストの類似性を判断するための関数と演算子、および類似した文字列の高速検索をサポートするインデックス演算子クラスを提供します。」

sql - PostgreSQL のテキストからの n-gram

1 に答える 1

Related

Reference