Mastering Diverse Domains through World Models

本文最后更新于 2025年4月7日 下午

Mastering Diverse Domains through World Models

google Nature2024 arxiv2023

Author: Danijar Hafner,12 Jurgis Pasukonis,1 Jimmy Ba,2 Timothy Lillicrap1

Paper link: http://arxiv.org/abs/2301.04104

Code link: https://github.com/danijar/dreamerv3

Goal: Developing a general algorithm that learns to solve tasks across a wide range of applications

Method: Dreamer learns a model of the environment and improves its behavior by imagining future scenarios.

纯强化学习,无需人类数据

将world model 和强化学习结合的典范,前半段输入是a,隐变量h,以及特征空间z,学习到动作隐变量h,还有encoder。实际上就是学到了预测未来的能力,这个未来是用h来进行表示的。在强化学习阶段,则用h来预测v,r,a。而h就是在抽象了当前信息,以及世界模型预测的信息后得到的动作隐变量,然后通过actor critic模型得到a r v

world model predicts the outcomes of potential actions, the critic judges the value of each outcome, and the actor chooses actions to reach the most valuable outcomes.

world model: Recurrent State-Space Model (RSSM)

能看到越往后预测的越不准确

结果:

reward平滑:


Mastering Diverse Domains through World Models
http://example.com/2025/04/06/2025-4/paper1/
作者
Artimis
发布于
2025年4月6日
许可协议