当前位置:首页 > 报告详情

何天宇.pdf

上传人: 拾亿 编号:1171142 2026-03-21 17页 2.25MB

1、Microsoft Research AsiaTowards Interactive World SimulatorTianyu HeMicrosoft Research Asia4/9/2025Microsoft Research AsiaPolicy Model:(|)World Model:(|,)World ModelDavid Ha,Jrgen Schmidhuber.World models.arXiv:1803.10122.interactive!Microsoft Research AsiaOutline1How to represent the visual world?2H

2、ow to enable interactive visual world modeling?We introduce VidTok,A cutting-edge family of video tokenizers.Interact with action:autoregressive world model on MineCraft.Interact with latent action:human-to-robot cross-embodiment generalization.Interact with video demonstration:zero-shot video imita

3、tion in real-world.Interact with camera viewpoint:explicit world model with underlying 3D structure.Microsoft Research AsiaVidTokEfficient ArchitectureSeparate spatial and temporal sampling reduces computational complexity without sacrificing quality.Advanced QuantizationFinite Scalar Quantization(F

4、SQ)addresses training instability and codebook collapse in discrete tokenization.Enhanced TrainingA two-stage strategypre-training on low-res videos and fine-tuning on high-resboosts efficiency.Reduced frame rates improve motion dynamics representation.A cutting-edge family of video tokenizers that

5、excels in both continuous and discrete tokenizations.Tang et al.VidTok:A Versatile and Open-Source Video Tokenizer.arXiv:2412.13061.Microsoft Research AsiaVidTokLeading Reconstruction Performance.VidTok,trained on a large-scale video dataset,outperforms previous models across all metrics,including P

6、SNR,SSIM,LPIPS,and FVD.Tang et al.VidTok:A Versatile and Open-Source Video Tokenizer.arXiv:2412.13061.https:/ Research AsiaVidTokLeading Reconstruction Performance.VidTok exhibits a distinct advantage in detail reconstruction fidelity and subjective viewing experience.Tang et al.VidTok:A Versatile a

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
客服
商务合作
小程序
服务号
折叠