python-具有二级排序的GAE文本搜索
作者:互联网
代替在GAE上进行全文搜索,我使用以下解决方案返回经过排序的结果集,该结果集首先按关键字相关性排序,然后按日期排序(尽管第二种排序实际上可能是任何东西).感觉有点笨拙,我担心大规模性能,因此我正在寻找优化建议或完全不同的方法.
二级排序对我的用例很重要,因为给定的搜索可能会具有相同相关性的多个结果(以关键字匹配的数量来衡量),但是现在保留原始查询顺序会增加很多复杂性.有任何想法吗?
步骤1:获取与每个搜索字词匹配的键列表
results_key_list = []
search_terms = ['a','b','c'] #User's search query, split into a list of strings
#query each search term and add the results to a list
#yields a list of keys with frequency of occurance indicating relevance
for item in search_terms:
subquery = SomeEntity.all(keys_only=True)
subquery.filter('SearchIndex = ', item) #SearchIndex is a StringListProperty
#more filters...
subquery.order('-DateCreated')
for returned_item in subquery:
results_key_list.append(str(returned_item))
步骤2:按频率对列表进行分组,同时保持原始顺序
#return a dictionary of keys, with their frequency of occurrence
grouped_results = defaultdict(int)
for key in results_key_list:
grouped_results[key] += 1
sorted_results = []
known = set()
#creates an empty list for each match frequency
for i in range(len(search_terms)):
sorted_results.append([])
#using the original results ordering,
#construct an array of results grouped and ordered by descending frequency
for key in results_key_list:
if key in known: continue
frequency = grouped_results[key]
sorted_results[len(search_terms) - frequency].append(key)
known.add(key)
#combine into a single list
ordered_key_list = []
for l in sorted_results:
ordered_key_list.extend(l)
del ordered_key_list[:offset]
del ordered_key_list[limit:]
result = SomeEntity.get(ordered_key_list)
解决方法:
search_terms = ['a','b','c'] #User's search query, split into a list of strings
您可以按外观顺序累积键
并且可以一次建立所有关键频率.
利用排序稳定性通过降低频率,然后按照出现的顺序来进行排序:
keys_in_order_of_appearance = []
key_frequency = defaultdict(int)
for item in search_terms:
subquery = SomeEntity.all(keys_only=True)
subquery.filter('SearchIndex = ', item) #SearchIndex is a StringListProperty
#more filters...
subquery.order('-DateCreated')
for returned_item in subquery:
key = str(returned_item)
if key not in key_frequency:
key_order_of_appearance.append(key)
key_frequency[key] += 1
keys = keys_in_order_of_appearance[:] # order of appearance kept as secondary sort
keys.sort(key=key_frequency.__getitem__, reverse=True) # descending freq as primary sort
result = SomeEntity.get(ordered_key_list)
标签:google-app-engine,full-text-search,python 来源: https://codeday.me/bug/20191102/1988712.html