Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 20 days ago • 140
Meta-CoT: Enhancing Granularity and Generalization in Image Editing Paper • 2604.24625 • Published Apr 27 • 26
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 122
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video Paper • 2603.04291 • Published Mar 4 • 15
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video Paper • 2603.04291 • Published Mar 4 • 15
CubeComposer Collection Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video • 2 items • Updated Mar 5 • 2
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video Paper • 2603.04291 • Published Mar 4 • 15