Mastering Diverse Domains through World Models

本文最后更新于 2025年4月7日下午

Mastering Diverse Domains through World Models

google Nature2024 arxiv2023

Author: Danijar Hafner,12 Jurgis Pasukonis,1 Jimmy Ba,2 Timothy Lillicrap1

Paper link: http://arxiv.org/abs/2301.04104

Code link: https://github.com/danijar/dreamerv3

Goal: Developing a general algorithm that learns to solve tasks across a wide range of applications

Method: Dreamer learns a model of the environment and improves its behavior by imagining future scenarios.

纯强化学习，无需人类数据

将world model 和强化学习结合的典范，前半段输入是a,隐变量h,以及特征空间z，学习到动作隐变量h,还有encoder。实际上就是学到了预测未来的能力，这个未来是用h来进行表示的。在强化学习阶段，则用h来预测v,r,a。而h就是在抽象了当前信息，以及世界模型预测的信息后得到的动作隐变量，然后通过actor critic模型得到a r v

world model predicts the outcomes of potential actions, the critic judges the value of each outcome, and the actor chooses actions to reach the most valuable outcomes.

world model： Recurrent State-Space Model (RSSM)