How to Combine Tree-Search Methods in Reinforcement Learning
作者:互联网
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布!
AAAI 2019 Best Paper
Abstract
1 Introduction
2 Preliminaries
3 The h-Greedy Policy and h-PI
4 h-Greedy Consistency
5 The h-Greedy Policy Alone is Not Sufficient For Partial Evaluation
6 Backup the Tree-Search Byproducts
7 Relation to ExistingWork
8 Experiments
9 Summary and Future Work
A Proof of Lemma 1
B Affinity of Tπ and Consequences
C Proof of Proposition 2
D Proof of Theorem 3
E h-Greedy Consistency in Each Iteration
F A Note on the Alternative λ-Return Operator
G More Experimental Results
标签:Search,Methods,Tree,Greedy,Consistency,Policy,Proof 来源: https://www.cnblogs.com/lucifer1997/p/14017015.html