《在企业级规模下平衡人工智能的成本、性能和可靠性.pdf》由会员分享,可在线阅读,更多相关《在企业级规模下平衡人工智能的成本、性能和可靠性.pdf(37页珍藏版)》请在三个皮匠报告上搜索。
1、 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.AIM3304Balance cost,performance&reliability for AI at enterprise scaleAnkur DesaiPrincipal Product Manager,Amazon BedrockHe/HimJared DeanWW Tech Lead Amazon BedrockH
2、e/HimDeepen MehtaEngineering Manager-AI Foundations 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.AgendaWelcome and IntroductionsOverview of Bedrock Inference OptionsIntuit experienceTechnical DetailsQ&A 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Ama
3、zon Web Services,Inc.or its affiliates.All rights reserved.Introductions 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,I
4、nc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.PriorityInference TiersFlexStandardReserved 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon We
5、b Services,Inc.or its affiliates.All rights reserved.The Challenge:Optimize for cost,latency,and accuracy 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.AccuracySpeed/LatencyCostImportant for Critical applicationsDriven by Business value from applicationUse case dependentInferenc
6、e driven 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Optimize for cost,latency,and accuracy13MODELSINFERENCEFEATURESFlexible and reduced-price consumption optionsModel type and sizeModel Distillation,Intelligent Prompt Routing and Prompt Caching 2025,Amazon Web Services,Inc.or