当前位置：首页 >英文主页 >中英对照 > 报告详情

英伟达（NVIDIA）：Cosmos 3：面向物理AI的全模态世界模型技术报告（英文版）（139页）.pdf

上传人：小*** 编号：1271260 2026-06-25 PDF PDF DOCX DOCX DOCX 139页 27.97MB 28张图表

下载：

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/139

立即下载

《英伟达（NVIDIA）：Cosmos 3：面向物理AI的全模态世界模型技术报告（英文版）（139页）.pdf》由会员分享，可在线阅读，更多相关《英伟达（NVIDIA）：Cosmos 3：面向物理AI的全模态世界模型技术报告（英文版）（139页）.pdf（139页珍藏版）》请在三个皮匠报告上搜索。

1、2026-6-22Cosmos 3:Omnimodal World Models for Physical AINVIDIA1AbstractWe introduce Cosmos 3,a family of omnimodal world models designed to jointly process and generate lan-guage,image,video,audio,and action sequences within a unified mixture-of-transformers architecture.By supporting highly flexibl

2、e input-output configurations,Cosmos 3 seamlessly unifies critical modalitiesfor Physical AIeffectively subsuming vision-language models,video generators,world simulators,andworld-action models into a single framework.Our evaluation demonstrates that Cosmos 3 establishesa new state-of-the-art across

3、 a diverse suite of understanding and generation tasks,demonstratingomnimodal world models as scalable,general-purpose backbones for embodied agents.Our post-trainedCosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Arti-ficial Analysis,and the best policy

4、 model by RoboArena at the time the technical report was written.Toaccelerate open research and deployment in Physical AI,we make our code,model checkpoints,curatedsynthetic datasets,and evaluation benchmark available under the Linux Foundations OpenMDW-1.1License at and huggingface.co/collections/n

5、vidia/cosmos3.The projectwebsite is available at CodeC Model CheckpointCosmos3-Superhuggingface.co/nvidia/Cosmos3-SuperCosmos3-Nanohuggingface.co/nvidia/Cosmos3-NanoCosmos3-Super-Text2Imagehuggingface.co/nvidia/Cosmos3-Super-Text2ImageCosmos3-Super-Image2Videohuggingface.co/nvidia/Cosmos3-Super-Imag

6、e2VideoCosmos3-Nano-Policy-DROIDhuggingface.co/nvidia/Cosmos3-Nano-Policy-DROIDOpen Synthetic DatasetSDG-PhyxSimhuggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Physical-Interaction-ScenesSDG-RobotSimhuggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Embodied-Robot-ScenesSD

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

1. **模型概述**：NVIDIA推出Cosmos 3，一种多模态世界模型，统一处理语言、图像、视频、音频和动作序列，采用混合Transformer架构（MoT），支持灵活输入输出配置，整合视觉语言模型、视频生成器、世界模拟器和动作模型。 2. **性能表现**：在多项任务中达到SOTA，如Text-to-Image和Image-to-Video生成模型在Artificial Analysis排名第一，机器人策略模型在RoboArena领先。 3. **开源资源**：代码、模型（如Cosmos3-Super/Nano）、合成数据集（SDG系列）及评估基准（Cosmos-HUE）已开源，地址为github.com/nvidia/cosmos和huggingface.co/collections/nvidia/cosmos3。 4. **架构设计**：包含多模态编码器、双塔层结构（推理器/生成器）、3D多模态位置嵌入，支持多种生成模式（如文本生成视频、动作预测）。 5. **训练数据**：推理器使用24.2M样本（22.0M预训练+2.2M微调），生成器依赖大规模多模态数据，涵盖物理AI任务（机器人、自动驾驶等）。

Cosmos 3是什么？如何训练Cosmos 3？ Cosmos 3有何优势？

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询商务合作事宜

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙思想领动信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备2023027541号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠