首页 > 其他分享> > Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

2022-03-10 12:32:12 作者：互联网

发表时间：2018（ICRA 2018）
文章要点：这篇文章提出了一个叫model-based and model-free (Mb-Mf)的算法，先用model based的方法训一个policy，再用model free的方法来fine tune。具体的，先学一个model，然后用planning的方式（simple random sampling shooting method）选择动作

这相当于有了一个Model-Based Control。然后用这个方式收集数据，拟合成一个策略网络作为model free的初始化策略（using the model-based learner to initialize a model-free learner.）

然后用model free的方法继续训这个policy（TRPO）。
总结：是个make sense的方法，就是阶段有点多，先要收集样本学model（random trajectories），然后根据model做planning收集planning策略下的数据，然后拟合一个policy网络，最后用model free的方法继续训练。感觉挺麻烦的。
疑问：里面这个文章说选TRPO的原因是他不需要初始化value function，难道做连续控制的时候不去拟合value network吗，可能有会更好吧？（such policy gradient algorithms are a good choice for model-free fine-tuning since they do not require any critic or value function for initialization）

标签：Based,Network,free,planning,value,policy,based,model,Model
来源： https://www.cnblogs.com/initial-h/p/15988970.html