当前位置:首页 > 报告详情

Session31_AI Accelerators.pdf

上传人: bu****ng 编号:1188931 2026-03-31 420页 85.53MB

1、ISSCC 2026SESSION 31 AI Accelerators31.1:A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-FreeLarge-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding 2026 IEEE International Solid-State Circuits Conference1 of 35A 14.08-to-135.69Toke

2、n/s ReRAM-on-Logic Stacked Outlier-FreeLarge-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-DecodingPingcheng Dong1,2,Yonghao Tan1,2,Xuejiao Liu2,Peng Luo2,Yu Liu2,Di Pang2,SongchenMa1,2,Xijie Huang1,Shih-Yang Liu1,Dong Zhang1,2,Zhichao Lu3,Luhon

3、g Liang2,Chi-Ying Tsui1,2,Fengbin Tu1,2,Liang Zhao4,Kwang-Ting Cheng1,2Presenter:Fengshi Tian1,21The Hong Kong University of Science and Technology,Hong Kong,China2AI Chip Center for Emerging Smart System(ACCESS),Hong Kong,China3Hefei Reliance Memory,Hefei,China 4Zhejiang University,Hangzhou,China31

4、.1:A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-FreeLarge-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding 2026 IEEE International Solid-State Circuits Conference2 of 35Outline Introduction Overall Architecture Key FeaturesLocal

5、 Rotation Unit(LRU)with Decomposed FWHTReRAM-Stacked PNM(RS-PNM)with Blockwise VQAdaptive Parallel Speculative Decoding(APSD)Workload-Decoupled Out-of-Order Scheduler(WDOS)Experiment Results Summary31.1:A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-FreeLarge-Language-Model Accelerator with

6、 Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding 2026 IEEE International Solid-State Circuits Conference3 of 35Outline Introduction Overall Architecture Key FeaturesLocal Rotation Unit(LRU)with Decomposed FWHTReRAM-Stacked PNM(RS-PNM)with Blockwise VQAdaptive Parallel S

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
客服
商务合作
小程序
服务号
折叠