当前位置：首页 >英文主页 >中英对照 > 报告详情

Google：Gemini 1.5技术报告（英文版）（154页）.pdf

上传人：淘*** 编号：650876 2025-04-07 PDF PDF DOCX DOCX DOCX 154页 6.85MB 58张图表

下载：

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/154

立即下载

《Google：Gemini 1.5技术报告（英文版）（154页）.pdf》由会员分享，可在线阅读，更多相关《Google：Gemini 1.5技术报告（英文版）（154页）.pdf（154页珍藏版）》请在三个皮匠报告上搜索。

1、Gemini 1.5:Unlocking multimodalunderstanding across millions of tokens ofcontextGemini Team,Google1In this report,we introduce the Gemini 1.5 family of models,representing the next generation of highlycompute-efficient multimodal models capable of recalling and reasoning over fine-grained informatio

2、nfrom millions of tokens of context,including multiple long documents and hours of video and audio.Thefamily includes two new models:(1)an updated Gemini 1.5 Pro,which exceeds the February version onthe great majority of capabilities and benchmarks;(2)Gemini 1.5 Flash,a more lightweight variantdesig

3、ned for efficiency with minimal regression in quality.Gemini 1.5 models achieve near-perfectrecall on long-context retrieval tasks across modalities,improve the state-of-the-art in long-documentQA,long-video QA and long-context ASR,and match or surpass Gemini 1.0 Ultras state-of-the-artperformance a

4、cross a broad set of benchmarks.Studying the limits of Gemini 1.5s long-context ability,we find continued improvement in next-token prediction and near-perfect retrieval(99%)up to atleast 10M tokens,a generational leap over existing models such as Claude 3.0(200k)and GPT-4 Turbo(128k).Finally,we hig

5、hlight real-world use cases,such as Gemini 1.5 collaborating with professionalson completing their tasks achieving 26 to 75%time savings across 10 different job categories,as well assurprising new capabilities of large language models at the frontier;when given a grammar manual forKalamang,a languag

6、e with fewer than 200 speakers worldwide,the model learns to translate English toKalamang at a similar level to a person who learned from the same content.1.IntroductionWe present our latest multimodal models from the Gemini line:Gemini 1.5 Pro and Gemini 1.5Flash.They are members of Gemini 1.5,a ne

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

本文介绍了Gemini 1.5系列模型，包括Gemini 1.5 Pro和Gemini 1.5 Flash。这些模型代表了下一代高效的多模态模型，能够回忆和推理来自数百万个上下文标记（包括多个长文档和数小时的视频和音频）的细粒度信息。Gemini 1.5 Pro在大多数能力和基准测试中超过了之前的版本，而Gemini 1.5 Flash则是一个更轻量级的版本，设计用于提高效率，同时对质量的影响最小。Gemini 1.5模型在长上下文检索任务中实现了近乎完美的召回率，在长文档QA、长视频QA和长上下文ASR方面改进了最先进的技术，并在一系列基准测试中与Gemini 1.0 Ultra的先进性能相匹配或超越。 Gemini 1.5 Pro和Gemini 1.5 Flash在长上下文能力方面取得了显著的进步，例如，在100万标记的上下文中实现近100%的召回率，并在1000万标记的上下文中保持99.2%的召回率。这些模型还展示了在长文档、长视频和长音频中的新能力，例如，仅通过一本参考语法书和双语词汇表学习将英语翻译成卡拉芒语，以及从单个视频帧中提取信息。总的来说，Gemini 1.5系列模型在多模态理解和长上下文处理方面取得了重大突破，为处理更复杂和更长的多模态输入提供了新的可能性。

谷歌Gemini 1.5模型如何实现多模态理解？ Gemini 1.5模型在长文本处理方面有何优势？谷歌Gemini 1.5模型如何助力新语言学习？

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询商务合作事宜

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙思想领动信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备2023027541号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠