1、2023-12-06Gemini:A Family of Highly CapableMultimodal ModelsGemini Team,Google1This report introduces a new family of multimodal models,Gemini,that exhibit remarkable capabilitiesacross image,audio,video,and text understanding.The Gemini family consists of Ultra,Pro,and Nanosizes,suitable for applic
2、ations ranging from complex reasoning tasks to on-device memory-constraineduse-cases.Evaluation on a broad range of benchmarks show that our most-capable Gemini Ultra modeladvances the state-of-the-art in 30 of 32 of these benchmarks notably being the first model to achievehuman-expert performance o
3、n the well-studied exam benchmark MMLU,and improving the state of theart in every one of the 20 multimodal benchmarks we examined.We believe that the new capabilities ofGemini models in cross-modal reasoning and language understanding will enable a wide variety of usecases and we discuss our approac
4、h toward deploying them responsibly to users.1.IntroductionWe present Gemini,a family of highly capable multimodal models developed at Google.We trainedGemini jointly across image,audio,video,and text data for the purpose of building a model with bothstrong generalist capabilities across modalities
5、alongside cutting-edge understanding and reasoningperformance in each respective domain.Gemini 1.0,our first version,comes in three sizes:Ultra for highly-complex tasks,Pro for enhancedperformance and deployability at scale,and Nano for on-device applications.Each size is specificallytailored to add
6、ress different computational limitations and application requirements.We evaluatethe performance of Gemini models on a comprehensive suite of internal and external benchmarkscovering a wide range of language,coding,reasoning,and multimodal tasks.Gemini advances state-of-the-art in large-scale langua