首页 > 其他分享> > How to Combine Tree-Search Methods in Reinforcement Learning

How to Combine Tree-Search Methods in Reinforcement Learning




AAAI 2019 Best Paper




1 Introduction


2 Preliminaries


3 The h-Greedy Policy and h-PI


4 h-Greedy Consistency


5 The h-Greedy Policy Alone is Not Sufficient For Partial Evaluation


6 Backup the Tree-Search Byproducts


7 Relation to ExistingWork


8 Experiments


9 Summary and Future Work


A Proof of Lemma 1


B Affinity of Tπ and Consequences


C Proof of Proposition 2


D Proof of Theorem 3


E h-Greedy Consistency in Each Iteration


F A Note on the Alternative λ-Return Operator


G More Experimental Results


来源: https://www.cnblogs.com/lucifer1997/p/14017015.html