marklogic - Marklogic : 検索:フレーズの提案

Question

私は次のxml構造を持っています:

<root>
<text>Hi i am a test user and doing testing here. Copied text Let’s suppose we have a text field where the user needs to enter the number of a person id. If the user types 1, all ids starting with 1 will show up. If the user types 12, all ids starting with 12 will show up.</text>
</root>

今、私は「テキスト」要素にフィールドを作成し、フィールドワードレキシコンも有効にしました。次のクエリを実行しました:

xquery version "1.0-ml"; 
import module namespace search ="http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy"; 
let $options := 
<search:options xmlns="http://marklogic.com/appservices/search">
 <default-suggestion-source>
    <word collation="http://marklogic.com/collation//S2">
      <field name="text"/>
    </word>
 </default-suggestion-source>
</search:options>
return
search:suggest("tes", $options, 100)

その結果、私は「テスト」と「設定」を提案として得ましたが、これはまったく問題ありませんが、上記の場合のように、「テストユーザーと実行中...」と「ここでテスト中...」を期待しているテキストも必要です。これについて私を助けてください。

score 1 · Accepted Answer

単語レキシコンは単語トークンを格納するため、フレーズではなく個々の単語が返されます。フレーズ内で一致させるには、<text>検索候補エントリごとに範囲インデックスを使用しconcat('*',$term,'*')て、API呼び出しが次のようになるようにすることができますsearch:suggest("*tes*", $options, 100)。

ただし、先頭にワイルドカードパターンがあるため、クエリの速度が大幅に低下し、検索語の位置からではなく、要素の値全体が返されると思います。つまり、ではありHi i am a test user and doing testing here. Copied text ...ませんtest user and doing ...。もちろん、これをプログラムで解析することもできます。

パフォーマンスを向上させるには、チャンク要素範囲インデックス戦略の使用を検討してください。チャンクソースのサイズによっては、前処理と潜在的に大量のデータが必要ですが、必要な結果が得られ、非常に高速でスケーラブルです。これを行う方法を詳細に説明しているAvalonコンサルティングの優れたブログ投稿があります。

score 1 · Accepted Answer

部分句を検索するには、二重引用符 ( の文法値) を閉じ引用符なしで使用します。例: search:suggest('"and th', $options) "and that" "and this" 閉じている二重引用符は、フレーズが完全であることをパーサーに通知するため、拡張された候補は生成されません。また、制約と共に使用されます。

search:suggest('constraint:"and th', $options)</search:quotation>

===== http://docs.marklogic.com/search:suggestから

marklogic - Marklogic : 検索:フレーズの提案

2 に答える 2

Related

Reference