hibernate - FieldBridge の新しい定義フィールドに tika ブリッジを追加します

Question

数値識別子 ( ) によるバイナリデータへのエンティティウィッチポイントがありますbinId。ユーティリティクラスは、ID を指定してバイナリストリーム形式を提供できます。私の目標は、このバイナリストリーム (通常はファイル) にインデックスを付けることです。

コンセプトは、バイナリデータ識別子フィールドのブリッジを作成することです。ブリッジ内でユーティリティクラスを呼び出し、ストリームを取得し、指定されたストリームで新しいフィールドを作成します。次に、このストリームをTika bridgeでインデックス化/分析したいと思います。

FieldBridgeを使用していますが、LuceneOptions は使用していません。さらに、エンティティクラスにアノテーションを付けることができないため、Programmatic APIを使用します。

これまでのところ、次のようになります。

public class SearchMappingFactory {
    @Factory
    public SearchMapping getSearchMapping(){
        SearchMapping mapping = new SearchMapping();
        mapping.entity(Attachment.class)
            .indexed()
            .property("id", ElementType.FIELD)
            .documentId()
            .property("name", ElementType.FIELD)
            .field()
            .property("description", ElementType.FIELD)
            .field()
            .property("binId", ElementType.FIELD)
            .field()
            .name("attachmentFile")
            .bridge(AttachmentContentSearchBridge.class)
            .property("content", ElementType.FIELD)  // this is my try to define additional bridge
            .field()
            .bridge(TikaBridge.class)
        ;
        return mapping;
    };
}

そして私の橋：

public class AttachmentContentSearchBridge implements FieldBridge {

    @Override
    public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
        Reader reader = new InputStreamReader(MyBinUtil.getStreamForId((Integer)value));
        Field field = new Field("content",reader);
//i'd like to add tika bridge here, but i cant
        document.add(field);
    }
}

ブリッジから始めましょう。それは非常に単純です。唯一の問題は、新しく作成されたフィールドへのブリッジを定義できないことcontentです。これが主な問題です。

contentブリッジを定義できるフィールドをマッピングに追加して解決しようとしました。定義が受け入れられ、アプリケーションが起動して動作しますが、index forcontentにはキーワードがありません :(

FieldBridge 内で作成されたフィールドに対して TikeBridge を定義する方法についてアドバイスをお願いします。

お読みいただきありがとうございます。お役に立てば幸いです。

score 0 · Accepted Answer

ID とカスタム util クラスを介してストリームデータを取得する場合、@TikaBridge アノテーションは使用できません。注釈のドキュメントが示唆しているように、バイナリデータフィールドまたは文字列/URL フィールドに対してのみ機能します。後者の場合、文字列/URL を使用してバイナリデータをロードします。

あなたの場合、org.hibernate.search.bridge.builtin.TikaBridgeで何が起こるかを再実装するだけです。

興味深い部分は次のとおりです。

public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
    if ( value == null ) {
        throw new IllegalArgumentException( "null cannot be passed to Tika bridge" );
    }
    InputStream in = null;
    try {
        in = getInputStreamForData( value );

        Metadata metadata = metadataProcessor.prepareMetadata();
        ParseContext parseContext = parseContextProvider.getParseContext( name, value );

        StringWriter writer = new StringWriter();
        WriteOutContentHandler contentHandler = new WriteOutContentHandler( writer );

        Parser parser = new AutoDetectParser();
        parser.parse( in, contentHandler, metadata, parseContext );
        luceneOptions.addFieldToDocument( name, writer.toString(), document );

        // allow for optional indexing of metadata by the user
        metadataProcessor.set( name, value, document, luceneOptions, metadata );
    }
    catch ( Exception e ) {
        throw propagate( e );
    }
    finally {
        closeQuietly( in );
    }
}

データの入力ストリームが必要な場合は、tika パーサーを作成し、Tika がデータを書き込むことができる出力 StringWriter と共にストリームを渡します。最後に、抽出したデータを LuceneOptions を使用して新しいフィールドとして追加する必要があります。

hibernate - FieldBridge の新しい定義フィールドに tika ブリッジを追加します

1 に答える 1

Related

Reference