编程语言
首页 > 编程语言> > python-具有二级排序的GAE文本搜索

python-具有二级排序的GAE文本搜索

作者:互联网

代替在GAE上进行全文搜索,我使用以下解决方案返回经过排序的结果集,该结果集首先按关键字相关性排序,然后按日期排序(尽管第二种排序实际上可能是任何东西).感觉有点笨拙,我担心大规模性能,因此我正在寻找优化建议或完全不同的方法.

二级排序对我的用例很重要,因为给定的搜索可能会具有相同相关性的多个结果(以关键字匹配的数量来衡量),但是现在保留原始查询顺序会增加很多复杂性.有任何想法吗?

步骤1:获取与每个搜索字词匹配的键列表

results_key_list = []
search_terms = ['a','b','c'] #User's search query, split into a list of strings

#query each search term and add the results to a list
#yields a list of keys with frequency of occurance indicating relevance     
for item in search_terms:
    subquery = SomeEntity.all(keys_only=True)                   
    subquery.filter('SearchIndex = ', item) #SearchIndex is a StringListProperty
    #more filters...            
    subquery.order('-DateCreated')                  
    for returned_item in subquery:
        results_key_list.append(str(returned_item))     

步骤2:按频率对列表进行分组,同时保持原始顺序

#return a dictionary of keys, with their frequency of occurrence            
grouped_results = defaultdict(int)              
for key in results_key_list:
    grouped_results[key] += 1               

sorted_results = []
known = set()

#creates an empty list for each match frequency 
for i in range(len(search_terms)):
    sorted_results.append([])

#using the original results ordering, 
#construct an array of results grouped and ordered by descending frequency  
for key in results_key_list:
    if key in known: continue
    frequency = grouped_results[key]
    sorted_results[len(search_terms) - frequency].append(key)
    known.add(key)          

#combine into a single list
ordered_key_list = []   
for l in sorted_results:
    ordered_key_list.extend(l)  

del ordered_key_list[:offset]
del ordered_key_list[limit:]    
result = SomeEntity.get(ordered_key_list)

解决方法:

search_terms = ['a','b','c'] #User's search query, split into a list of strings

您可以按外观顺序累积键
并且可以一次建立所有关键频率.
利用排序稳定性通过降低频率,然后按照出现的顺序来进行排序:

keys_in_order_of_appearance = []
key_frequency = defaultdict(int)

for item in search_terms:
    subquery = SomeEntity.all(keys_only=True)                   
    subquery.filter('SearchIndex = ', item) #SearchIndex is a StringListProperty
    #more filters...            
    subquery.order('-DateCreated')                  
    for returned_item in subquery:
        key = str(returned_item)
        if key not in key_frequency:
            key_order_of_appearance.append(key)
        key_frequency[key] += 1

keys = keys_in_order_of_appearance[:]   # order of appearance kept as secondary sort
keys.sort(key=key_frequency.__getitem__, reverse=True) # descending freq as primary sort
result = SomeEntity.get(ordered_key_list)

标签:google-app-engine,full-text-search,python
来源: https://codeday.me/bug/20191102/1988712.html