sqlite - モバイルデバイスでの全文検索?

Question

まもなく、新しいモバイルアプリケーションの開発に着手します。この特定のアプリは、テキストベースのフィールドの大量の検索に使用されます。モバイルプラットフォームでこれらのタイプの検索を可能にするのに最適なデータベースエンジンの種類について、グループ全体からの提案はありますか?

詳細には Windows Mobile 6 が含まれており、.Net CF を使用します。また、一部のテキストベースのフィールドは 35 ～ 500 文字です。デバイスは、バッチと WiFi の 2 つの異なる方法で動作します。もちろん、WiFi の場合は、本格的な DB エンジンにリクエストを送信し、結果をフェッチするだけです。この質問は、デバイスのフラッシュ/リムーバブルストレージカードに関する情報がロードされたデータベースを格納する「バッチ」バージョンを中心にしています。

いずれにせよ、SQLCE にはいくつかの基本的な索引付けがあることは知っていますが、本格的なバージョンを取得するまでは、本格的な「全文」スタイルの索引を使用することはできません。

データがどのように見えるかの例:

「エプロンカーペンター調節可能なレザーコンテナポケットウエスト金具ベルト」など

他の特定のオプションの評価にはまだ入っていません。最初にいくつかの特定の手段を指摘するために、このグループの経験を活用したいと考えているからです。

提案/ヒントはありますか？

score 5 · Accepted Answer

つい最近、同じ問題が発生しました。これが私がしたことです：

各オブジェクトの ID とテキストだけを保持するクラスを作成しました (私の場合は、それを SKU (アイテム番号) と説明と呼びました)。これにより、検索にのみ使用されるため、メモリ使用量の少ない小さなオブジェクトが作成されます。一致するものが見つかった後でも、本格的なオブジェクトをデータベースから取得します。

public class SmallItem
{
    private int _sku;
    public int Sku
    {
        get { return _sku; }
        set { _sku = value; }
    }

    // Size of max description size + 1 for null terminator.
    private char[] _description = new char[36];
    public char[] Description
    {
        get { return _description; }
        set { _description = value; }
    }

    public SmallItem()
    {
    }
}

このクラスを作成したら、これらのオブジェクトの配列 (私の場合は実際には List を使用しました) を作成し、それをアプリケーション全体の検索に使用できます。このリストの初期化には少し時間がかかりますが、起動時にこれを気にするだけで済みます。基本的には、データベースに対してクエリを実行し、このリストを作成するために必要なデータを取得するだけです。

リストを作成したら、必要な単語を検索してすばやく検索できます。これは contains であるため、単語内の単語も検索する必要があります (たとえば、drill は、drill、drillbit、drill などを返します)。これを行うために、自家製のアンマネージ c# の contains 関数を作成しました。単語の文字列配列を受け取ります (複数の単語を検索できます...「AND」検索に使用します...説明には、渡されたすべての単語を含める必要があります...「OR」は現在サポートされていませんこの例では）。単語のリストを検索すると、ID のリストが作成され、呼び出し元の関数に返されます。ID のリストを取得したら、データベースで高速クエリを簡単に実行して、高速インデックス ID 番号に基づいて本格的なオブジェクトを返すことができます。返される結果の最大数も制限していることに言及する必要があります。これは取り出せました。誰かが検索語として「e」などを入力すると便利です。それは多くの結果を返すでしょう。

カスタムの Contains 関数の例を次に示します。

public static int[] Contains(string[] descriptionTerms, int maxResults, List<SmallItem> itemList)
{
    // Don't allow more than the maximum allowable results constant.            
    int[] matchingSkus = new int[maxResults];

    // Indexes and counters.
    int matchNumber = 0;
    int currentWord = 0;
    int totalWords = descriptionTerms.Count() - 1;  // - 1 because it will be used with 0 based array indexes

    bool matchedWord;

    try
    {   
        /* Character array of character arrays. Each array is a word we want to match.
         * We need the + 1 because totalWords had - 1 (We are setting a size/length here,
         * so it is not 0 based... we used - 1 on totalWords because it is used for 0
         * based index referencing.)
         * */
        char[][] allWordsToMatch = new char[totalWords + 1][];

        // Character array to hold the current word to match. 
        char[] wordToMatch = new char[36]; // Max allowable word size + null terminator... I just picked 36 to be consistent with max description size.

        // Loop through the original string array or words to match and create the character arrays. 
        for (currentWord = 0; currentWord <= totalWords; currentWord++)
        {
            char[] desc = new char[descriptionTerms[currentWord].Length + 1];
            Array.Copy(descriptionTerms[currentWord].ToUpper().ToCharArray(), desc, descriptionTerms[currentWord].Length);
            allWordsToMatch[currentWord] = desc;
        }

        // Offsets for description and filter(word to match) pointers.
        int descriptionOffset = 0, filterOffset = 0;

        // Loop through the list of items trying to find matching words.
        foreach (SmallItem i in itemList)
        {
            // If we have reached our maximum allowable matches, we should stop searching and just return the results.
            if (matchNumber == maxResults)
                break;

            // Loop through the "words to match" filter list.
            for (currentWord = 0; currentWord <= totalWords; currentWord++)
            {
                // Reset our match flag and current word to match.
                matchedWord = false;
                wordToMatch = allWordsToMatch[currentWord];

                // Delving into unmanaged code for SCREAMING performance ;)
                unsafe
                {
                    // Pointer to the description of the current item on the list (starting at first char).
                    fixed (char* pdesc = &i.Description[0])
                    {
                        // Pointer to the current word we are trying to match (starting at first char).
                        fixed (char* pfilter = &wordToMatch[0])
                        {
                            // Reset the description offset.
                            descriptionOffset = 0;

                            // Continue our search on the current word until we hit a null terminator for the char array.
                            while (*(pdesc + descriptionOffset) != '\0')
                            {
                                // We've matched the first character of the word we're trying to match.
                                if (*(pdesc + descriptionOffset) == *pfilter)
                                {
                                    // Reset the filter offset.
                                            filterOffset = 0;

                                    /* Keep moving the offsets together while we have consecutive character matches. Once we hit a non-match
                                     * or a null terminator, we need to jump out of this loop.
                                     * */
                                    while (*(pfilter + filterOffset) != '\0' && *(pfilter + filterOffset) == *(pdesc + descriptionOffset))
                                    {
                                        // Increase the offsets together to the next character.
                                        ++filterOffset;
                                        ++descriptionOffset;
                                    }

                                    // We hit matches all the way to the null terminator. The entire word was a match.
                                    if (*(pfilter + filterOffset) == '\0')
                                    {
                                        // If our current word matched is the last word on the match list, we have matched all words.
                                        if (currentWord == totalWords)
                                        {
                                            // Add the sku as a match.
                                            matchingSkus[matchNumber] = i.Sku.ToString();
                                            matchNumber++;

                                            /* Break out of this item description. We have matched all needed words and can move to
                                             * the next item.
                                             * */
                                            break;
                                        }

                                        /* We've matched a word, but still have more words left in our list of words to match.
                                         * Set our match flag to true, which will mean we continue continue to search for the
                                         * next word on the list.
                                         * */
                                         matchedWord = true;
                                    }
                                }

                                // No match on the current character. Move to next one.
                                descriptionOffset++;
                            }

                            /* The current word had no match, so no sense in looking for the rest of the words. Break to the
                             * next item description.
                             * */
                             if (!matchedWord)
                                break;
                        }
                    }
                }
            }
        };

        // We have our list of matching skus. We'll resize the array and pass it back.
        Array.Resize(ref matchingSkus, matchNumber);
        return matchingSkus;
    }
    catch (Exception ex)
    {
        // Handle the exception
    }
}

一致する sku のリストを取得したら、配列を反復処理して、一致する sku のみを返すクエリコマンドを作成できます。

パフォーマンスのアイデアについては、次のことがわかりました (次の手順を実行します)。

~171,000 アイテムを検索
一致するすべてのアイテムのリストを作成する
データベースにクエリを実行し、一致するアイテムのみを返します
本格的なアイテムを構築する (SmallItem クラスに似ていますが、より多くのフィールドがあります)
本格的な項目オブジェクトをデータグリッドに入力します。

モバイルユニットでは、プロセス全体に2〜4秒かかります（すべてのアイテムを検索する前に一致制限に達すると2秒かかり、すべてのアイテムをスキャンする必要がある場合は4秒かかります）.

また、アンマネージコードを使用せずに String.IndexOf を使用してこれを実行しようとしました (そして、String.Contains を試してみました... IndexOf と同じパフォーマンスが必要でした)。その方法はずっと遅かった... 約 25 秒。

また、StreamReader と [Sku Number]|[Description] の行を含むファイルを使用してみました。コードは、アンマネージコードの例に似ていました。この方法では、スキャン全体で約 15 秒かかりました。速度はそれほど悪くありませんが、優れていません。ファイルと StreamReader メソッドには、私が示した方法よりも利点が 1 つあります。ファイルは事前に作成できます。私が示した方法では、メモリと、アプリケーションの起動時にリストをロードする最初の時間が必要です。171,000 個のアイテムの場合、これには約 2 分かかります。アプリが起動するたびにその初期ロードを待つ余裕がある場合 (もちろん、別のスレッドで実行できます)、この方法で検索するのが最速の方法です (少なくとも私は見つけました)。

それが役立つことを願っています。

PS - アンマネージコードの一部を支援してくれた Dolch に感謝します。

score 2 · Accepted Answer

Lucene.Net を試すことができます。モバイルデバイスにどれだけ適しているかはわかりませんが、「高性能でフル機能のテキスト検索エンジンライブラリ」として請求されています。

http://incubator.apache.org/lucene.net/ http://lucene.apache.org/java/docs/

sqlite - モバイルデバイスでの全文検索?

2 に答える 2

Related

Reference