full-text-search - Word1 を使用し、Word2 を使用しない XQuery 全文検索

Question

以下は XML 構造です -

<Docs>
  <Doc>
    <Name>Doc 1</Name>
    <Notes>
        <specialNote>
          This is a special note section. 
           <B>This B Tag is used for highlighting any text and is optional</B>        
           <U>This U Tag will underline any text and is optional</U>        
           <I>This I Tag is used for highlighting any text and is optional</I>        
        </specialNote>      
        <generalNote>
           <P>
            This will store the general notes and might have number of paragraphs. This is para no 1. NO Child Tags here         
           </P>
           <P>
            This is para no 2            
           </P>  
        </generalNote>      
    </Notes>  
    <Desc>
        <P>
          This is used for Description and might have number of paragraphs. Here too, there will be B, U and I Tags for highlighting the description text and are optional
          <B>Bold</B>
          <I>Italic</I>
          <U>Underline</U>
        </P>
        <P>
          This is description para no 2 with I and U Tags
          <I>Italic</I>
          <U>Underline</U>
        </P>      
    </Desc>
</Doc>

1000 のDocタグがあります。ユーザーが検索できる検索条件WORD1と NOTを指定したいWORD2。以下はクエリです -

for $x in doc('Documents')/Docs/Doc[Notes/specialNote/text() contains text 'Tom' 
ftand  ftnot 'jerry' or 
Notes/specialNote/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/B/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/I/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/U/text() contains text 'Tom' ftand ftnot 'jerry' or
Notes/generalNote/P/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/B/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/I/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/U/text() contains text 'Tom' ftand ftnot 'jerry']
return $x/Name

このクエリの結果は間違っています。Tomつまり、結果にはとの両方を持つドキュメントが含まれていjerryます。だから私はクエリを次のように変更しました -

for $x in doc('Documents')/Docs/Doc[. contains text 'Tom' ftand ftnot 'jerry'] 
return $x/Name

このクエリでは、正確な結果が得られます。Tomおよび Notを含むドキュメントのみですがjerry、膨大な時間がかかります... 約。45 秒、前のものは 10 秒かかりました!!

BaseX 7.5 XML データベースを使用しています。

これについて専門家のコメントが必要です:)

score 4 · Accepted Answer

最初のクエリはドキュメント内の各テキストノードを個別にテストします。最初のテキストノードにはTomが含まれていますがJerryは含まれていないTom and Jerryため、一致します。

Doc2 番目のクエリでは、要素が 1 つの文字列に連結されているかのように、要素のすべてのテキストコンテンツに対して全文検索が実行されます。これは、各テキストノードを個別にインデックス化するBaseX のフルテキストインデックスでは (現在) 回答できません。

解決策は、各用語の全文検索を個別に実行し、最終的に結果をマージすることです。これはテキストノードごとに個別に実行できるため、インデックスを使用できます。

for $x in (doc('Documents')/Docs/Doc[.//text() contains text 'Tom']
            except doc('Documents')/Docs/Doc[.//text() contains text 'Jerry'])
return $x/Name

上記のクエリは、2 つのインデックスアクセスを使用して、クエリオプティマイザーによって次の同等のクエリに書き換えられます。

for $x in (db:fulltext("Documents", "Tom")/ancestor::*:Doc
            except db:fulltext("Documents", "Jerry")/ancestor::*:Doc)
return $x/Name

必要に応じて、結果をマージする順序を微調整して、中間結果を小さく保つこともできます。

full-text-search - Word1 を使用し、Word2 を使用しない XQuery 全文検索

1 に答える 1

Related

Reference