python - Djangoクエリセット：このクエリセットの最適化についてサポートが必要

Question

私は、教育的な質問記録のリストからいくつかの一般的なタグの組み合わせをふるいにかけようとしています。

この例では、2タグの例（tag-tag）のみを見ており、「point」+「curve」（65エントリ）「add」+「subtract」（40エントリ）のような結果の例を取得する必要があります。）..。

これは、SQLステートメントで望ましい結果です。

SELECT a.tag, b.tag, count(*)
FROM examquestions.dbmanagement_tag as a
INNER JOIN examquestions.dbmanagement_tag as b on a.question_id_id = b.question_id_id
where a.tag != b.tag
group by a.tag, b.tag

基本的に、一般的な質問を含むさまざまなタグをリストに識別し、それらを同じ一致するタグの組み合わせ内にグループ化します。

私はdjangoクエリセットを使用して同様のクエリを実行しようとしました：

    twotaglist = [] #final set of results

    alphatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
    betatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
    startindex = 0 #startindex reduced by 1 to shorten betatag range each time the atag changes. this is to reduce the double count of comparison of similar matches of tags
    for atag in alphatags:
        for btag in betatags[startindex:]:
            if (atag['tag'] != btag['tag']):
                commonQns = [] #to check how many common qns
                atagQns = tag.objects.filter(tag=atag['tag'], question_id__in=qnlist).values('question_id').annotate()
                btagQns = tag.objects.filter(tag=btag['tag'], question_id__in=qnlist).values('question_id').annotate()
                for atagQ in atagQns:
                    for btagQ in btagQns:
                        if (atagQ['question_id'] == btagQ['question_id']):
                            commonQns.append(atagQ['question_id'])
                if (len(commonQns) > 0):
                    twotaglist.append({'atag': atag['tag'],
                                        'btag': btag['tag'],
                                        'count': len(commonQns)})
        startindex=startindex+1

ロジックは正常に機能しますが、このプラットフォームはかなり新しいので、効率を上げるために、より短い回避策があるかどうかはわかりません。

現在、クエリは約5KX5Kタグの比較で約45秒必要でした:(

アドオン：タグクラス

class tag(models.Model):
    id = models.IntegerField('id',primary_key=True,null=False)
    question_id = models.ForeignKey(question,null=False)
    tag = models.TextField('tag',null=True)
    type = models.CharField('type',max_length=1)

    def __str__(self):
        return str(self.tag)

score 2 · Accepted Answer

私があなたの質問を正しく理解したなら、私は物事をより単純に保ち、このようなことをするでしょう

relevant_tags = Tag.objects.filter(question_id__in=qnlist)
#Here relevant_tags has both a and b tags

unique_tags = set()
for tag_item in relevant_tags:
    unique_tags.add(tag_item.tag)

#unique_tags should have your A and B tags

a_tag = unique_tags.pop()
b_tag = unique_tags.pop() 

#Some logic to make sure what is A and what is B

a_tags = filter(lambda t : t.tag == a_tag, relevant_tags)
b_tags = filter(lambda t : t.tag == b_tag, relevant_tags)

#a_tags and b_tags contain A and B tags filtered from relevant_tags

same_question_tags = dict()

for q in qnlist:
  a_list = filter(lambda a: a.question_id == q.id, a_tags)
  b_list = filter(lambda a: a.question_id == q.id, b_tags)
  same_question_tags[q] = a_list+b_list

これの良い点は、返されたタグをループで繰り返してすべての一意のタグを取得し、さらに繰り返してタグごとに除外することで、タグをN個に拡張できることです。

これを行う方法は他にもあります。

score 2 · Accepted Answer

残念ながら、外部キー（または1対1）が関係していない限り、djangoは参加を許可しません。コードでそれを行う必要があります。実行時間を大幅に改善する単一のクエリでそれを行う方法（完全にテストされていない）を見つけました。

from collections import Counter
from itertools import combinations

# Assuming Models
class Question(models.Model):
    ...

class Tag(models.Model):
    tag = models.CharField(..)
    question = models.ForeignKey(Question, related_name='tags')

c = Counter()
questions = Question.objects.all().prefetch_related('tags') # prefetch M2M
for q in questions:
    # sort them so 'point' + 'curve' == 'curve' + 'point'
    tags = sorted([tag.name for tag in q.tags.all()])
    c.update(combinations(tags,2)) # get all 2-pair combinations and update counter
c.most_common(5) # show the top 5

上記のコードは、 Counters、itertools.combinations、およびdjango prefetch_relatedを使用しており、不明な可能性のある上記のビットのほとんどをカバーしているはずです。上記のコードが正確に機能しない場合は、これらのリソースを確認し、それに応じて変更してください。

モデルでM2Mフィールドを使用していない場合でも、逆の関係Questionを使用して、M2Mフィールドであるかのようにタグにアクセスできます。逆の関係をからに変更する私の編集を参照してください。モデルを定義した方法で機能するはずのその他の編集をいくつか行いました。tag_settags

を指定しない場合は、フィルターとprefetch_relatedをrelated_name='tags'変更するだけで、準備は完了です。tagstag_set

python - Djangoクエリセット：このクエリセットの最適化についてサポートが必要

2 に答える 2

Related