ES セットアップで何が起こっているのかを解読するのに少し助けが必要です。基本的に、カスタム アナライザー (サポートする言語ごとに 1 つ) を使用して複数のインデックスを作成し、クライアントごとにインデックス時のマッピングを行いました。問題は検索時に発生します。すべてのクライアントのインデックスを検索すると、特定のインデックス (英語) が常に他の言語よりも上位にランク付けされます。
これが私の ES セットアップの内容です。複数のクライアントがあり、各クライアントは複数の言語でドキュメントをアップロードできます。したがって、この要件に対応するために、clientId と言語に従って名前が付けられたインデックスをセットアップしました。つまり、A-en、A-de、A-fr、B-en、B-it などです (A と B はクライアント ID、 -xx は ISO 言語コードです)。各インデックスは、そのクライアントの必要な言語のカスタム アナライザーで作成され、各フィールドは、次のように設定セクションでこれらのカスタム アナライザーを使用するようにマップされます。インデックス付き:
{
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"analysis" : {
"filter" : {
"english_keywords" : {
"type" : "keyword_marker",
"keywords" : ["_none_"]
},
"english_stop" : {
"type" : "stop",
"stopwords" : ["_none_"]
},
"synonym_filter" : {
"type" : "synonym",
"expand" : 1,
"synonyms" : ["_none_"]
},
"english_stemmer" : {
"type" : "stemmer",
"language" : "english"
}
},
"analyzer" : {
"lens-english" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["english_keywords", "lowercase", "english_stop", "english_stemmer", "synonym_filter"]
}
}
}
},
"mappings" : {
"video" : {
"properties" : {
"Attributes" : {
"type" : "string",
"index" : "not_analyzed"
},
"ClientId" : {
"type" : "string",
"index" : "not_analyzed"
},
"Comments" : {
"type" : "string",
"analyzer" : "lens-english"
},
"Continent" : {
"type" : "string",
"index" : "not_analyzed"
},
"CountryOfOrigin" : {
"type" : "string",
"index" : "not_analyzed"
},
"CreatedDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Description" : {
"type" : "string",
"analyzer" : "lens-english"
},
"DescriptionEnglish" : {
"type" : "string",
"analyzer" : "english"
},
"DislikesCount" : {
"type" : "double"
},
"EnglishTranscription" : {
"type" : "string",
"analyzer" : "english"
},
"Favourite" : {
"type" : "string",
"index" : "not_analyzed"
},
"FromProject" : {
"type" : "boolean"
},
"IsSearchable" : {
"type" : "boolean"
},
"LanguageISOCode" : {
"type" : "string",
"index" : "not_analyzed"
},
"LanguageOfOrigin" : {
"type" : "string",
"index" : "not_analyzed"
},
"LikesCount" : {
"type" : "double"
},
"NativeTranscription" : {
"type" : "string",
"analyzer" : "lens-english"
},
"ObjectId" : {
"type" : "string",
"index" : "not_analyzed"
},
"Published" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Recommendations" : {
"type" : "string",
"index" : "not_analyzed"
},
"Status" : {
"type" : "long"
},
"Tags" : {
"type" : "string",
"analyzer" : "lens-english"
},
"Title" : {
"type" : "string",
"analyzer" : "lens-english"
},
"TitleEnglish" : {
"type" : "string",
"analyzer" : "english"
},
"TranscriptionStatus" : {
"type" : "double"
},
"UploadSource" : {
"type" : "double"
},
"VideoImage" : {
"type" : "string",
"index" : "no"
},
"ViewCount" : {
"type" : "double"
},
"WatchLater" : {
"type" : "string",
"index" : "not_analyzed"
},
"ExternalMetadata" : {
"type" : "nested",
"properties" : {
"Filters" : {
"type" : "string",
"index" : "not_analyzed"
},
"ProjectId" : {
"type" : "string",
"index" : "not_analyzed"
},
"Roles" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
}
そして、インデックス作成が必要なトルコ語文書をお持ちのお客様向けのトルコ語インデックスは次のとおりです...
{
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"analysis" : {
"filter" : {
"turkish_stop" : {
"type" : "stop",
"stopwords" : "_turkish_"
},
"synonym_filter" : {
"type" : "synonym",
"synonyms" : ["_none_"]
},
"turkish_lowercase" : {
"type" : "lowercase",
"language" : "turkish"
},
"turkish_keywords" : {
"type" : "keyword_marker",
"keywords" : ["_none_"]
},
"turkish_stemmer" : {
"type" : "stemmer",
"language" : "turkish"
}
},
"analyzer" : {
"lens-turkish" : {
"tokenizer" : "standard",
"filter" : ["apostrophe", "turkish_lowercase", "turkish_stop", "turkish_keywords", "turkish_stemmer", "synonym_filter"]
},
"folding" : {
"filter" : ["lowercase", "asciifolding"],
"tokenizer" : "standard"
}
}
}
},
"mappings" : {
"video" : {
"properties" : {
"Attributes" : {
"type" : "string",
"index" : "not_analyzed"
},
"ClientId" : {
"type" : "string",
"index" : "not_analyzed"
},
"Comments" : {
"type" : "string",
"analyzer" : "lens-turkish"
},
"Continent" : {
"type" : "string",
"index" : "not_analyzed"
},
"CountryOfOrigin" : {
"type" : "string",
"index" : "not_analyzed"
},
"CreatedDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Description" : {
"type" : "string",
"analyzer" : "lens-turkish"
},
"DescriptionEnglish" : {
"type" : "string",
"analyzer" : "english"
},
"DislikesCount" : {
"type" : "double"
},
"EnglishTranscription" : {
"type" : "string",
"analyzer" : "english"
},
"Favourite" : {
"type" : "string",
"index" : "not_analyzed"
},
"FromProject" : {
"type" : "boolean"
},
"IsSearchable" : {
"type" : "boolean"
},
"LanguageISOCode" : {
"type" : "string",
"index" : "not_analyzed"
},
"LanguageOfOrigin" : {
"type" : "string",
"index" : "not_analyzed"
},
"LikesCount" : {
"type" : "double"
},
"NativeTranscription" : {
"type" : "string",
"analyzer" : "lens-turkish"
},
"ObjectId" : {
"type" : "string",
"index" : "not_analyzed"
},
"Published" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"Recommendations" : {
"type" : "string",
"index" : "not_analyzed"
},
"Status" : {
"type" : "long"
},
"Tags" : {
"type" : "string",
"analyzer" : "lens-turkish"
},
"Title" : {
"type" : "string",
"analyzer" : "lens-turkish"
},
"TitleEnglish" : {
"type" : "string",
"analyzer" : "english"
},
"TranscriptionStatus" : {
"type" : "double"
},
"UploadSource" : {
"type" : "double"
},
"VideoImage" : {
"type" : "string",
"index" : "no"
},
"ViewCount" : {
"type" : "double"
},
"WatchLater" : {
"type" : "string",
"index" : "not_analyzed"
},
"ExternalMetadata" : {
"type" : "nested",
"properties" : {
"Filters" : {
"type" : "string",
"index" : "not_analyzed"
},
"ProjectId" : {
"type" : "string",
"index" : "not_analyzed"
},
"Roles" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
}
すべての言語インデックスはこのパターンに従います (24 の異なる言語がサポートされています)。各クライアントは、インデックスを作成するとき、およびそれらのインデックスにドキュメントをインデックス化するときに、これらの設定のいずれかを使用します。
それで、これはすべて問題ないようで、ESはこれに満足しています. ここで検索クエリの話になりますが、ここがややこしいところです。
私の検索クエリは、「フレーズは個々の用語よりも優先されなければならない」という要件に基づいています。また、クライアントが検索を実行する場合、その検索はすべてのドキュメントと言語にわたって実行する必要があります (そのため、名前にクライアント ID を使用してインデックスが作成されます)。これは、インデックス名の URL にワイルドカードを使用することによって実現されます。つまり、/A-*/video/_search は、言語に関係なくすべてのクライアント A ドキュメントを検索します。
これが私がサーバーに投稿した検索クエリです...
POST /5617c3c867567a0b0c570a95-*/video/_search
{
"from": "0",
"size": "1000",
"query": {
"template": {
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "{{query_string}}",
"type": "most_fields",
"fields": [
"Title^3",
"Description^2",
"TitleEnglish",
"DescriptionEnglish",
"EnglishTranscription",
"NativeTranscription",
"Tags",
"Comments"
],
"tie_breaker": 0.1,
"minimum_should_match": "70%"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"IsSearchable": true
}
},
{
"term": {
"Private": false
}
}
]
}
}
}
},
"params": {
"query_string": "Turkish"
}
}
}
}
「トルコ語」という単語を検索していて、すべての言語を検索していることに注意してください。結果を見て、*-en インデックスは、ドキュメント フィールド全体で「トルコ語」という単語をより多く含む *-tr (トルコ語) インデックスよりも高いランキングを返すことに注意してください。
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0.21282451,
"hits": [
{
"_index": "5617c3c867567a0b0c570a95-en",
"_type": "video",
"_id": "561bd2b274cbe0123c099ace",
"_score": 0.21282451,
"_source": {
"CountryOfOrigin": "United Kingdom",
"Continent": "Europe",
"LanguageOfOrigin": "English",
"LanguageIsoCode": "en",
"Title": "Nikes",
"TitleEnglish": "Eng video Eng lang",
"Description": "izlemek",
"DescriptionEnglish": "",
"VideoImage": "ff3a093a-700e-4c53-94df-cc5eb425c043_Image.jpg",
"ViewCount": 9,
"LikesCount": 0,
"DislikesCount": 0,
"CreatedDate": "2015-10-12T15:33:05.634Z",
"WatchLater": [],
"Favourite": [],
"Status": 2,
"TranscriptionStatus": 6,
"UploadSource": 3,
"IsSearchable": true,
"FromProject": false,
"NativeTranscription": "",
"Tags": [
"Turkish",
"Nike"
],
"Comments": [],
"Attributes": [],
"Recommendations": [],
"ClientId": "5617c3c867567a0b0c570a95",
"Private": false,
"ObjectId": "561bd2b274cbe0123c099ace"
}
},
{
"_index": "5617c3c867567a0b0c570a95-en",
"_type": "video",
"_id": "5617cb8b74cbe2110890820b",
"_score": 0.19917427,
"_source": {
"CountryOfOrigin": "Armenia",
"Continent": "Europe",
"LanguageOfOrigin": "English",
"LanguageIsoCode": "en",
"Title": "English Video",
"TitleEnglish": "English Video",
"DescriptionEnglish": "",
"VideoImage": "df80412b-d6b9-4104-932b-c8e44b005fb2_Image.jpg",
"ViewCount": 16,
"LikesCount": 1,
"DislikesCount": 0,
"CreatedDate": "2015-10-09T14:13:30.893Z",
"WatchLater": [],
"Favourite": [],
"Status": 2,
"TranscriptionStatus": 5,
"UploadSource": 3,
"IsSearchable": true,
"FromProject": false,
"NativeTranscription": "",
"Tags": [
"Turkish",
"Purple Aki"
],
"Comments": [],
"Attributes": [],
"Recommendations": [],
"ClientId": "5617c3c867567a0b0c570a95",
"Private": false,
"ObjectId": "5617cb8b74cbe2110890820b"
}
},
{
"_index": "5617c3c867567a0b0c570a95-en",
"_type": "video",
"_id": "561bb49e74cbe002f09301fa",
"_score": 0.17025961,
"_source": {
"CountryOfOrigin": "United Kingdom",
"Continent": "Europe",
"LanguageOfOrigin": "English",
"LanguageIsoCode": "en",
"Title": "Mark's Transcription Test",
"TitleEnglish": "Mark's Transcription Test",
"DescriptionEnglish": "",
"VideoImage": "09c6d366-6807-4d9d-9588-fd4730907b9b_Image.jpg",
"ViewCount": 6,
"LikesCount": 0,
"DislikesCount": 0,
"CreatedDate": "2015-10-12T13:24:45.833Z",
"WatchLater": [],
"Favourite": [],
"Status": 2,
"TranscriptionStatus": 6,
"UploadSource": 3,
"IsSearchable": true,
"FromProject": false,
"NativeTranscription": "",
"Tags": [
"turkish",
"mark",
"Watch"
],
"Comments": [],
"Attributes": [],
"Recommendations": [],
"ClientId": "5617c3c867567a0b0c570a95",
"Private": false,
"ObjectId": "561bb49e74cbe002f09301fa"
}
},
{
"_index": "5617c3c867567a0b0c570a95-tr",
"_type": "video",
"_id": "5617c97c74cbe21108908205",
"_score": 0.12725623,
"_source": {
"CountryOfOrigin": "Turkey",
"Continent": "Asia",
"LanguageOfOrigin": "Turkish",
"LanguageIsoCode": "tr",
"Title": "Turkish Video - Under 10mins - Request Trans",
"TitleEnglish": "Turkish Video - Under 10mins - Request Trans",
"Description": "Turkish - Request Trans",
"DescriptionEnglish": "Turkish - Request Trans",
"VideoImage": "ba4341e5-7af8-418e-91e3-818e290a0989_Image.jpg",
"ViewCount": 21,
"LikesCount": 0,
"DislikesCount": 0,
"CreatedDate": "2015-10-09T14:04:44.033Z",
"WatchLater": [],
"Favourite": [],
"Status": 2,
"TranscriptionStatus": 5,
"UploadSource": 3,
"IsSearchable": true,
"FromProject": false,
"NativeTranscription": "",
"Tags": [],
"Comments": [
"Turkish",
"Liverpool"
],
"Attributes": [
"5617c80974cbe211089081fd_3_2",
"5617c80974cbe211089081fe_4_1"
],
"Recommendations": [],
"ClientId": "5617c3c867567a0b0c570a95",
"Private": false,
"ObjectId": "5617c97c74cbe21108908205"
}
},
{
"_index": "5617c3c867567a0b0c570a95-tr",
"_type": "video",
"_id": "5617ca3574cbe21108908208",
"_score": 0.07719648,
"_source": {
"CountryOfOrigin": "Argentina",
"Continent": "South America",
"LanguageOfOrigin": "Turkish",
"LanguageIsoCode": "tr",
"Title": "Turkish Video - No Trans",
"TitleEnglish": "Turkish Video - No Trans",
"DescriptionEnglish": "",
"VideoImage": "735f0c09-3c1c-415e-870f-70f18be632ea_Image.jpg",
"ViewCount": 14,
"LikesCount": 0,
"DislikesCount": 0,
"CreatedDate": "2015-10-09T14:07:49.705Z",
"WatchLater": [],
"Favourite": [],
"Status": 2,
"TranscriptionStatus": 0,
"UploadSource": 3,
"IsSearchable": true,
"FromProject": false,
"NativeTranscription": "",
"Tags": [
"Turkish"
],
"Comments": [],
"Attributes": [],
"Recommendations": [],
"ClientId": "5617c3c867567a0b0c570a95",
"Private": false,
"ObjectId": "5617ca3574cbe21108908208"
}
},
{
"_index": "5617c3c867567a0b0c570a95-de",
"_type": "video",
"_id": "5617c8ca74cbe211089081ff",
"_score": 0.015614418,
"_source": {
"CountryOfOrigin": "Germany",
"Continent": "Europe",
"LanguageOfOrigin": "German",
"LanguageIsoCode": "de",
"Title": "German Video - Under 10mins - With SRT",
"TitleEnglish": "German Video - Under 10mins - With SRT",
"Description": "German Video\nTag: Oct 9",
"DescriptionEnglish": "German Video\nTag: Oct 9",
"VideoImage": "04bf4827-3459-41f6-9fc0-7003dfe7ea5d_Image.jpg",
"ViewCount": 5,
"LikesCount": 0,
"DislikesCount": 0,
"Published": "2015-10-09T14:03:01.066Z",
"CreatedDate": "2015-10-09T14:01:46.517Z",
"WatchLater": [],
"Favourite": [],
"Status": 2,
"TranscriptionStatus": 5,
"UploadSource": 3,
"IsSearchable": true,
"FromProject": false,
"NativeTranscription": "Ich denke, dass Nachhaltigkeit sich darum dreht,Verpackungen zu reduzieren oder Energie, die bei der Produktion entsteht,zu verringern oder auch lokal zu produzieren,um die CO2-Bilanz zu reduzieren.Ich glaube, dass sich viele Verbraucherbeim Einkaufen über Nachhaltigkeit Gedanken machen,was letztendlich auch beeinflusst was sie kaufen,vor allem aber würde ich von mir als Verbraucherin behaupten,dass ich mich an die Firmen halte, die die gleichen Wertebezüglich Nachhaltigkeit haben wie ich.Ich gehe gezielt in Geschäfte, die weniger Verpackung benutzenoder solche, die man einfacher recyclen kannund wenn wir können, gehen wir immer zu Fuß zu regionalenoder lokalen Geschäften, wenn sie in der Nähe sind.Und viele Unternehmen versuchen die gleichen Produktefür einen niedrigeren Preis zu verkaufen,aber wenn eine Firma mich überzeugen kann, dass ihre Produkte nachhaltiger sindoder sicherer für mich und meine Umwelt,wäre ich am Ende auch bereit, mehr zu bezahlen.Wenn ein Unternehmen behauptet, nachhaltig zu sein,will ich immer herausfinden auf welche Art und Weisesie sicherer sind.Es gibt so viele Öko-Zertifikateund ich weiß nicht was die bedeutenoder ob sie wirklich für Nachhaltigkeit stehen.Vielleicht könnte es einen Beschluss geben,der es den Verbrauchern einfacher macht,nachhaltige Produkte zu verstehen, das wäre für alle eine große Hilfe.",
"EnglishTranscription": "I think that sustainability turns about, Packaging to reduce or energy generated in the production, to reduce or even locally to produce, to reduce the CO2 footprint. I think that to many consumers worry buy about sustainability, What ultimately affects what you buy but above all, I would argue by me as a consumer, that I the companies consider myself, the same values as I have with regard to sustainability. I'm specifically going to shops that use less packaging or such which is easier to recycle can and if we can, we go to regional always walking or local shops if they are nearby. And many companies are trying the same products for sale, for a lower price But if a company can convince me that their products are more sustainable or safe for me and my environment. would I also be willing to pay more at the end. If a company claims to be sustainable. will I always find out in what way they are safer. There are so many eco-certificates and I don't know what you mean or whether they really are for sustainability. Perhaps there could be a decision, Consumers easier makes it,. understanding sustainable products that would be a great help for everyone.",
"Tags": [
"Oct 9",
"Turkish"
],
"Comments": [],
"Attributes": [
"5617c80974cbe211089081fd_3_2",
"5617c80974cbe211089081fe_4_4"
],
"Recommendations": [],
"ClientId": "5617c3c867567a0b0c570a95",
"Private": false,
"ObjectId": "5617c8ca74cbe211089081ff"
}
},
{
"_index": "5617c3c867567a0b0c570a95-tr",
"_type": "video",
"_id": "561b860d74cbe0103cf23369",
"_score": 0.011710813,
"_source": {
"CountryOfOrigin": "Turkey",
"Continent": "Asia",
"LanguageOfOrigin": "Turkish",
"LanguageIsoCode": "tr",
"Title": "izlemek Nike",
"TitleEnglish": "Demo 4",
"Description": "izlemek Nike",
"DescriptionEnglish": "Demo 4",
"VideoImage": "97e66fe2-6f62-4a43-b234-0abda414dedf_Image.jpg",
"ViewCount": 17,
"LikesCount": 0,
"DislikesCount": 0,
"Published": "2015-10-12T10:07:52.281Z",
"CreatedDate": "2015-10-12T10:06:05.015Z",
"WatchLater": [],
"Favourite": [],
"Status": 2,
"TranscriptionStatus": 5,
"UploadSource": 3,
"IsSearchable": true,
"FromProject": false,
"NativeTranscription": "Şimdi makyaj masamın başına geçtimVe makyajımı yapmaya başlayacağımÖncelikle güzel bir baz süreceğimSmashbox'ın Photo Finish bazını kullanacağımÖnce göz makyajımı yapacağımBugün böyle altın ve siyah tonlarındaya da altın kahve tonlarında bir makyaj yapmayı planlıyorumÇünkü, giyeceğim bir ceket varCeket de altın zincirler ve altın detaylar taşıyorEe tabii, söz konusu altın olduğu zamanAltın ve bronz ve doğal tonlar olduğu zamanNaked paletimden elimi çekemiyorumEe tabii far kullanacaksam, bir far bazı kullanmadan olmazUrban Decay far kullanacağım içintesadüfen Urban Decay'den primer potion göz bazını kullanacağımŞu kadar miktar benim için yeterliBeni biraz böyle nefes nefese vehani koşturur vaziyette görebilirsinizÇünkü birazcık acelem varVe hazır böyle güzel bir saç makyaj gibi bir şey planlıyorlenNeden videosunu çekmeyeyim, diye düşündüm",
"EnglishTranscription": "Now I take over my dressing table And I'm going to start doing my makeup First of all, I'm going to drive a beautiful base Smashbox's Photo Finish base to use First, I'm going to do my eye makeup Today in shades of gold and black or I'm planning to do a makeup in shades of gold and coffee I'm going to wear a coat, because there Jacket in gold chains and gold carries the details So of course, when it comes to gold When gold and bronze and natural hues I can't get my hand off my naked palette So of course I use a headlight headlights not without some Urban Decay eyeshadow I use for Incidentally, I'm going to use from the Urban Decay primer potion eye base This quantity is enough for me That's me a little breathless and you know, the one you can see running condition Because it's a little bit of a hurry And such a beautiful something like hair make-up ready planned yorlen Why is the video I thought, that I may not",
"Tags": [
"test tag",
"turkish",
"mark",
"izlemek",
"Purple Aki"
],
"Comments": [],
"Attributes": [
"5617c80974cbe211089081fd_3_1"
],
"Recommendations": [],
"ClientId": "5617c3c867567a0b0c570a95",
"Private": false,
"ObjectId": "561b860d74cbe0103cf23369"
}
}
]
}
}
何を探すべきかを知っている人は、これに目を向けて、ここに欠けているものがあるかどうかを確認できますか?