首页 > 编程语言> > python-Django Queryset：在优化此组查询时需要帮助

python-Django Queryset：在优化此组查询时需要帮助

2019-10-31 10:58:14 作者：互联网

我正在尝试从教育问题记录列表中筛选出一些常见的标签组合.

对于此示例,我仅查看2标记示例(tag-tag),我应该得到如下结果示例：
“点”“曲线”(65个条目)
“添加”“减”(40个条目)
…

这是SQL语句中期望的结果：

SELECT a.tag, b.tag, count(*)
FROM examquestions.dbmanagement_tag as a
INNER JOIN examquestions.dbmanagement_tag as b on a.question_id_id = b.question_id_id
where a.tag != b.tag
group by a.tag, b.tag

基本上,我们可以将具有常见问题的不同标签识别为一个列表,并将它们分组在相同的匹配标签组合中.

我试图使用Django queryset做类似的查询：

    twotaglist = [] #final set of results

    alphatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
    betatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
    startindex = 0 #startindex reduced by 1 to shorten betatag range each time the atag changes. this is to reduce the double count of comparison of similar matches of tags
    for atag in alphatags:
        for btag in betatags[startindex:]:
            if (atag['tag'] != btag['tag']):
                commonQns = [] #to check how many common qns
                atagQns = tag.objects.filter(tag=atag['tag'], question_id__in=qnlist).values('question_id').annotate()
                btagQns = tag.objects.filter(tag=btag['tag'], question_id__in=qnlist).values('question_id').annotate()
                for atagQ in atagQns:
                    for btagQ in btagQns:
                        if (atagQ['question_id'] == btagQ['question_id']):
                            commonQns.append(atagQ['question_id'])
                if (len(commonQns) > 0):
                    twotaglist.append({'atag': atag['tag'],
                                        'btag': btag['tag'],
                                        'count': len(commonQns)})
        startindex=startindex+1

逻辑工作正常,但是由于我对这个平台还很陌生,所以我不确定是否有较短的解决方法来提高效率.

目前,在大约5K X 5K标签比较中,查询需要大约45秒：(

插件：标签类

class tag(models.Model):
    id = models.IntegerField('id',primary_key=True,null=False)
    question_id = models.ForeignKey(question,null=False)
    tag = models.TextField('tag',null=True)
    type = models.CharField('type',max_length=1)

    def __str__(self):
        return str(self.tag)

解决方法:

不幸的是,除非涉及外键(或一对一),否则django不允许加入.您将必须在代码中执行此操作.我找到了一种方法(完全未经测试)使用单个查询来执行此操作,这将显着缩短执行时间.

from collections import Counter
from itertools import combinations

# Assuming Models
class Question(models.Model):
    ...

class Tag(models.Model):
    tag = models.CharField(..)
    question = models.ForeignKey(Question, related_name='tags')

c = Counter()
questions = Question.objects.all().prefetch_related('tags') # prefetch M2M
for q in questions:
    # sort them so 'point' + 'curve' == 'curve' + 'point'
    tags = sorted([tag.name for tag in q.tags.all()])
    c.update(combinations(tags,2)) # get all 2-pair combinations and update counter
c.most_common(5) # show the top 5

上面的代码使用Counters、itertools.combinations和django prefetch_related,它们应该覆盖上面未知的大多数位.如果上面的代码无法正常工作,请查看这些资源,然后进行相应的修改.

如果您不在Question模型上使用M2M字段,则仍然可以使用reverse relations来访问标签,就好像它是M2M字段一样.请参阅我的编辑,它将反向关系从tag_set更改为tag.我进行了其他一些编辑,这些编辑应与您定义模型的方式一起使用.

如果您未指定related_name =’tags’,则只需更改过滤器中的标签以及将prefetch_related与tag_set相关联,就可以了.

标签：django-queryset,django-views,python,django,sql
来源： https://codeday.me/bug/20191031/1975232.html