Mastering Diverse Domains through World Models
本文最后更新于 2025年4月7日 下午
Mastering Diverse Domains through World Models
google Nature2024 arxiv2023
Author: Danijar Hafner,12 Jurgis Pasukonis,1 Jimmy Ba,2 Timothy Lillicrap1
Paper link: http://arxiv.org/abs/2301.04104
Code link: https://github.com/danijar/dreamerv3
Goal: Developing a general algorithm that learns to solve tasks across a wide range of applications
Method: Dreamer learns a model of the environment and improves its behavior by imagining future scenarios.
纯强化学习,无需人类数据

将world model 和强化学习结合的典范,前半段输入是a,隐变量h,以及特征空间z,学习到动作隐变量h,还有encoder。实际上就是学到了预测未来的能力,这个未来是用h来进行表示的。在强化学习阶段,则用h来预测v,r,a。而h就是在抽象了当前信息,以及世界模型预测的信息后得到的动作隐变量,然后通过actor critic模型得到a r v
world model predicts the outcomes of potential actions, the critic judges the value of each outcome, and the actor chooses actions to reach the most valuable outcomes.
world model: Recurrent State-Space Model (RSSM)

能看到越往后预测的越不准确
结果:
reward平滑:
