《从硅谷到人工智能服务:优化推理和工程的未来.pdf》由会员分享,可在线阅读,更多相关《从硅谷到人工智能服务:优化推理和工程的未来.pdf(13页珍藏版)》请在三个皮匠报告上搜索。
1、From Silicon to AI Serving:From Silicon to AI Serving:Optimizing Inference and Engineering Whats NextFrom Silicon to AI Serving From Silicon to AI Serving:Optimizing Inference and Engineering Whats NextDonggunDonggunKimKimHead of ProductFUTURE TECHNOLOGIES SYMPOSIUMThe AI Market moves fast hardware
2、must anticipate the future and be ready for optimizations yet to come.Designing Todays Chips for Tomorrows AIFeb 2023Jan 2024Dec 20247B13B33B65BLlama7B13B70BLlama27B13B34BCodeLlama70BCodeLlama8B70BLlama3405BLlama3.11B3B11B90BLlama3.270BLlama3.3 Llama4RNGD Tape-out Unveil*Source:Trends-Artificial Int
3、elligence(5/25)CustomerSamplingMPMacroscopic Trends in AI EvolutionIC Development ProcessProduct Enablement vs.AI Model VolatilityDevelopmentFuriosa RNGDPowerfully efficient and programmabledata center AI accelerator*RNGD is pronounced,Renegade512 TFLOPS64 TFLOPS(FP8)x 8 PE48 GBMemory Capacity256 MB
4、 SRAM384 TB/s On-chip Bandwidth1.5 TB/sMemory Bandwidth180 WTDP2 x HBM3CoWoS-STensor Contraction as a PrimitiveA well-designed architecture should reduce usage complexity within the target domain.Flop analysis for BERT*Source:Data Movement is All You Need:a case study on optimizing transformers,MLSY
5、S21Silicon to Serving:The Ongoing OptimizationFlexibility to Support AI ModelsFlexibility to Support AI Models*Source:Trends-Artificial Intelligence(5/25)Optimizing SystemOptimizing SystemEnabling a chip for AI serving requires optimization across multiple dimensions.Serving Efficiency MattersServin
6、g Efficiency MattersFuriosa LLM-Flexibility to Support AI ModelsLayered optimization tools designed to streamline the use of diverse AI models.Challenges in Serving-Serving EfficiencyAI serving optimization means managing various hazards,and this challenge becomes especially inte