TL;DR: Fine-tuning image and video generation models to 'draw actions' as visual patterns could lead us to powerful agents for robotics and GUI apps.
Diffusion Agents
TL;DR: Fine-tuning image and video generation models to 'draw actions' as visual patterns could lead us to powerful agents for robotics and GUI apps.