Session31_AI Accelerators.pdf-三个皮匠报告

1、ISSCC 2026SESSION 31 AI Accelerators31.1:A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-FreeLarge-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding 2026 IEEE International Solid-State Circuits Conference1 of 35A 14.08-to-135.69Toke

2、n/s ReRAM-on-Logic Stacked Outlier-FreeLarge-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-DecodingPingcheng Dong1,2,Yonghao Tan1,2,Xuejiao Liu2,Peng Luo2,Yu Liu2,Di Pang2,SongchenMa1,2,Xijie Huang1,Shih-Yang Liu1,Dong Zhang1,2,Zhichao Lu3,Luhon

3、g Liang2,Chi-Ying Tsui1,2,Fengbin Tu1,2,Liang Zhao4,Kwang-Ting Cheng1,2Presenter:Fengshi Tian1,21The Hong Kong University of Science and Technology,Hong Kong,China2AI Chip Center for Emerging Smart System(ACCESS),Hong Kong,China3Hefei Reliance Memory,Hefei,China 4Zhejiang University,Hangzhou,China31

4、.1:A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-FreeLarge-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding 2026 IEEE International Solid-State Circuits Conference2 of 35Outline Introduction Overall Architecture Key FeaturesLocal

5、 Rotation Unit(LRU)with Decomposed FWHTReRAM-Stacked PNM(RS-PNM)with Blockwise VQAdaptive Parallel Speculative Decoding(APSD)Workload-Decoupled Out-of-Order Scheduler(WDOS)Experiment Results Summary31.1:A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-FreeLarge-Language-Model Accelerator with

6、 Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding 2026 IEEE International Solid-State Circuits Conference3 of 35Outline Introduction Overall Architecture Key FeaturesLocal Rotation Unit(LRU)with Decomposed FWHTReRAM-Stacked PNM(RS-PNM)with Blockwise VQAdaptive Parallel S

Session31_AI Accelerators.pdf

相关报告