《字节跳动:2025豆包大模型Seedream 2.0技术报告:原生中英双语图像生成模型(英文版).pdf》由会员分享,可在线阅读,更多相关《字节跳动:2025豆包大模型Seedream 2.0技术报告:原生中英双语图像生成模型(英文版).pdf(33页珍藏版)》请在三个皮匠报告上搜索。
1、Seedream 2.0:A Native Chinese-English Bilingual ImageGeneration Foundation ModelSeed Vision Team,ByteDanceAbstractRapid advancement of diffusion models has catalyzed remarkable progress in the field of imagegeneration.However,prevalent models such as Flux,SD3.5 and Midjourney,still grapple withissue
2、s like model bias,limited text rendering capabilities,and insufficient understanding of Chinesecultural nuances.To address these limitations,we present Seedream 2.0,a native Chinese-Englishbilingual image generation foundation model that excels across diverse dimensions,which adeptlymanages text pro
3、mpt in both Chinese and English,supporting bilingual image generation and textrendering.We develop a powerful data system that facilitates knowledge integration,and a captionsystem that balances the accuracy and richness for image description.Particularly,Seedream isintegrated with a self-developed
4、bilingual large language model(LLM)as a text encode,allowingit to learn native knowledge directly from massive data.This enable it to generate high-fidelityimages with accurate cultural nuances and aesthetic expressions described in either Chinese orEnglish.Beside,Glyph-Aligned ByT5 is applied for f
5、lexible character-level text rendering,while aScaled ROPE generalizes well to untrained resolutions.Multi-phase post-training optimizations,including SFT and RLHF iterations,further improve the overall capability.Through extensiveexperimentation,we demonstrate that Seedream 2.0 achieves state-of-the
6、-art performance acrossmultiple aspects,including prompt-following,aesthetics,text rendering,and structural correctness.Furthermore,Seedream 2.0 has been optimized through multiple RLHF iterations to closely alignits output with human preferences,as revealed by its outstanding ELO score.In addition,